February, 1049 


cof Applied Psychology 


bid Donald G. Paterson, University of Minnesota — 


Consulting Editors 


; Walter V. Bingham reriyef 


amienton Ls Been cote 


=, Saami ri msi: a 


Unieaiity; Donald B, Super; T 


. University of Pennsylvania; Alfred 





Table of Contents : 


The Paychological Barometer on Communise, Americanism and Socialism: 


H. C, Linx ann A. D. Ferrera 
. Am Analysis of the 1948 Polling Predictions: D. Katz 
Note on the Cardall Practicol Judgment Tesi: D. H. Carnincton 
Originality D. Store D Personnel: 
re 2 eee Ip Teen tr Sent mL Wiis 


The Rosensweig Pi icture-Frustation Study in the Selection of Department Store Sec- 
- tion Managers: H. W. Sunarxo ' r 


The Rorschach as a Predictor of Academic Success: B. R. MCCANDLESS ...,...... 


The OL Key oe ae) cinenrar nae brcer Sar gene Sehblantie sees 
cess at C S. R. Ostrom 
has dnin’ Ss ot neal wale ape: E. L. THornprxe 
A Follecy in the Use of Median Scale Values in Employee Check Lists: 
C. E. Jurcenszn 

An Empirical Approach to a Problem of Psychophysical Scaling: W. H. Ancor? . 
41360. alent iad isir ne ‘or Rating Performance of Industrial E 
icine N. tee oe E. J. Employees: 

Flesch Count ond Readership of Articles in a Midwestern Farm Paper : H. B, Lemay 


AY of Reading Nine Point Type in Relation to Line Width and 
sep da! M. A, Tiwxes ann D. G. /ATERSON 


Effect of Target Brightness on “Normal” and “Subnormal” Visual A 
J. E. Kunrz ann R. mci 





si 


Published Bi-monthly by The American Psychological Association, Inc. 


Prince and Lemon Sts., Lancaster, Pa., and 
1515 Massachusetts Ave., NW, Washington 5, D. C. 


Entered as second-class matter, Augtst 19, 1943, at the post alfice at Lancaster, Pa., under the act of March 3, 


f the of for in the Act of F 28, 1925, 
Repu te eins st Pi eects Sak PL. and ref 


Or eae shat by The American Psychological Association, Inc. 


1879 





| Journal o Applied Prchotogy 


© dua Yee bated tesla eld Asan bg da Aen 
“ghologieal Avociation, Tar, ‘Aumual -tubscription, $6.00, ‘sirgle copies, $123. Subscrip- 
on tons, orders, and other business communications should be addréssed to the American 
~ Psychological Association, Inc, 1515 Massactusetts Avenur, N. W,, Washington 5, D. C. 


% Articles for publication sheuld be sent to the Editor, Donald G. Paterson, Department 


| of Paychology, University of Minnesots, Minneapolis 14, Minn. 


| "This Joormal gives proaapt consideration to mamuscripts ‘reporting original énvest- 
~ gations in any ‘field of applied psychology except clinical and consulting. psychology. A 
' descriptive or theoretical article is occasionally accepted if it deals in a distinctive manner 
_ with a problem of applied psychology. The policy is, however, to favor papers dealing 
- with quantitative investigations of direct value to psychologists working in the following 
fields: Vocational diagnosis and occupational guidance; educational diagnosis, prediction 
» and guidance at the secondary school level and higher ; personne! selection, training, place- 
ment, transfer and promotion in business, industry and government service including the 


a armed forces; supervisory training in business, industry and government; illumination, 


ventilation and fatigue in industry; job analysis, description, classification and evaluation ; 
. measurement of morale of executives, supervisors, or employees; surveys of opinion on 
social or political issues, such as those conducted by The Psychological Corporation; 
| Paychological problems in market research and in advertising. 


Asticles may be under 500 words. The maximum is 16,000 words, the average in 
the neighborhood of 4,000 words. To reduce lag of publication, adherence to the rule of 
| “brevity consistent with clarity” is encouraged. 


: | A lapse of six to twelve months occurs between acceptance of an article and its 
“publication, the lag’ varying with the rate at which manuscripts are submitted. If, how- 


yi : ever, an author is prepared to defray the costs of printing the necessary extra pages, he 
“ak may arrange for earlier publication without thereby postponing the appearance of manu- — 


) scripts by other contributors. This enables the management to provide space in addition 


® to the scheduled 80 pages per issue. “Early publication” is thus a direct contribution to 


the subscribers. By cutting down lag in publication, it also benefits those authors whose 
articles are published in regular turn. 


Tables, footnotes and references as well as text of manuscripts should be typed 
double-spaced throughout. Authors should adhere to the conventions described by J. E. 
Anderson and W. L. Valentine in “The preparation of ariicles for publication in the 
journals of the American Psychological Association,” Psychol. Bull., 1944, 41, 345-376. 
A reprint of this article will be loaned to any prospective contributor who does not find 
it in his library. 





Journal of Applied Psychology 








Vol. 33, No. 1 February, 1949 








Attitude Research in the Army * 
Jack Elinson 


Troop Attitude Research Branch, Troop Information and Education 
Division, Special Staff, United States Army 


Many Army policies affecting troops depend on soldier reactions and 
cooperation for success. Necessarily then, those formulating policies need 
to know soldier reactions. Obviously, the larger an organization is, the 
harder it is for top management to keep in touch with what the troops 
(or employees) are thinking. Attitude surveys and opinion polls of 
soldiers are a carefully developed means of helping higher headquarters 
keep well-informed on these matters—as well informed as a good company 
commander or good supervisor can be as a result of getting around in his 
company or unit and talking and working with his men. Such surveys 
can determine: 


In case of an existing policy . . . 
. do men know about it and understand it? 
. are they in sympathy with it? 
. do they feel it is being carried out as intended? 


In case of a proposed policy .. . 
. what is likely to be its effect? 
. how are men likely to react to it? 


Research on troop opinion helps provide answers to such questions. 
Broadly speaking, attitude research functions for Army administra- 
tion in the following five ways: 


1. As a means of anticipating troop reaction to a new administrative 
policy. 

* This article was originally prepared as an administrative memorandum for the use 
of Lieutenant General Willard S. Paul, Director, Personnel and Administrative General 
Staff, USA. As such it represents the thinking of Major Paul D. Guernsey, Chief, 
Troop Attitude Research Branch, and Ira H. Cisin, Sr. Analyst in Charge of Unit 
Studies, as well as that of the writer, who is Sr. Analyst in Charge of Surveys. 


1 








Jack Elinson 


2. As a guide in the formulation of administrative policy or of a change 
in policy. 

3. As a means of evaluating the operation of an existing administra- 
tive program. 

4. As a means of evaluating, experimentally, the effectiveness of an 
information or training program. 

5. As a source of quantitative information and evidence in support 
of or against a proposed policy or change in policy. 


Anticipating Troop Reaction 


1. Troop Attitude Toward Army’s New Career Guidance Program. The 
Army’s new Career Guidance Program for enlisted personnel involves 
the establishment of a systematic promotion ladder within each type of 
service. Advancement up the ladder will be based essentially on Army- 
wide competitive testing. It was planned that the program would first 
go into effect for men in the Infantry. Before all the details of the 
Career Guidance Program had been decided upon, an attitude study was 
conducted among Infantrymen in order to get a preview of their reactions 
to the plan. 

The study indicated that: although the new Career Program was 
acceptable to enlisted men in principle, many enlisted men were in 
opposition to some of the details of the proposed program. In addition 
to revealing attitudes of men toward the various phases of the Career 
Program, the survey also disclosed areas of ignorance about the Career 
Program. So that, while attitudes toward the Career Program may be 
difficult to change and some alteration in the administrative details of 
the program may appear necessary, areas of ignorance about the program 
may be skillfully attacked with well-directed informational activity. 


Formulation of Administrative Policy 


2. Troop Attitude on Order of Demobilization. Months in advance of 
VE-Day, the War Department’s Special Planning Division was anticipa- 
ting the likelihood that demobilization policy adopted on defeat of 
Germany could result in morale disaster if the plan adopted were to be 
far out of line with what troops would consider fair. 

The problem then was to determine accurately what plan troops 
would be likely to consider fair. Troop cross sections were surveyed by 
research teams in the United States and in overseas theaters as early as 
November 1943 and several times subsequently. 

Research revealed four factors to be critical in soldiers’ minds: (1) 
length of service; (2) time overseas; (3) parenthood; and (4) combat 
participation. 





Attitude Research in the Army 3 


These were the four basic factors adopted in the Adjusted Service 
Rating Plan (Point Score for demobilization). 

One can read today in the book, just published, written by the 
Historical Division, Department of the Army, entitled The Army Ground 
Forces in World War II, how considerations of military necessity «s 
well as troop attitudes dovetailed into final determination of admin- 
istrative policy. 


Evaluating an Existing Program 


3a. Trend Surveys in the Universal Military Training Experimental 
Unit. When the Universal Military Training Experimental Unit was 
set up at Fort Knox, Kentucky, the program included various innova- 
tions in military procedure: Code of Conduct (a form of demerit system), 
Trainee Courts, with men themselves sitting somewhat as a jury, con- 
siderable emphasis on the Chaplains’ activities, compulsory educational 
program, concerted attention to off-duty activities, etc. In order to 
measure the trend of trainee reaction to these innovations, and to the 
training program in general, attitude studies have been made among the 
trainees in each of the cycles going through the unit,—studies conducted 
at the beginning, the middle and the end of each training cycle. The 
attitudes of the officers and cadre of the unit have also been obtained at 
the end of each training cycle. From the reports on these studies the 
Commanding General of Army Ground Forces and the Commanding 
General of the unit have followed any shift in reactions as trainees 
progressed through their training. As modifications are made in the 
experimental training program at the unit, the studies are re-designed 
to evaluate the results. 

3b. Studies Pertaining to Recruitment for the Military Service. In the 
Fall of 1947, staff officers of the Army’s Military Personnel Procurement 
Service Division and their advertising agency, the N. W. Ayer Co., 
began to feel as a result of a continuing decline in enlistments that a 
change in advertising direction was indicated. 

Accordingly, two coordinated surveys were conducted: the first by the 
Army, through its Attitude Research Branch to survey newly enlisted 
recruits; the other by the advertising agency through a commercial 
polling organization to survey young civilian males and their parents. 

The surveys yielded new insights into the problem of appropriate 
advertising for recruiting. For example, one traditional advantage of 
military service—early retirement and good retirement pay—was found 
to have practically no appeal among 17-18 year old youngsters, but was 
of considerable importance in the re-enlistment of older veterans. 





* Jack Elinson 


Evaluating an Information Program 


4. Most staff sections need at one time or other to have troops in the 
field informed on certain matters. Questions looming in the minds of 
those who must get out information are: (a) how can the information 
be made to reach the largest proportion of those who should be reached? 
(b) what presentation will be most effective in getting the information 
read after it is gotten out? (c) how can the information be put across so 
that it is most likely to be remembered once it is read? and once the in- 
formation is released, (1) how widely has it been seen, read and re- 
membered? (2) did it accomplish what it was supposed to accomplish? 

Effectiveness of any single information tool or device, such as movies, 
radio programs, posters, pamphlets, training courses, and the like, can 
validly be determined only by a true experimental approach using as 
subjects both control and experimental groups. During the war, num- 
erous such studies were made by the Research Branch on Hollywood- 
produced films which were calculated to give the soldier a better under- 
standing of the issues of the war. Compared to broad cross-sectional 
sample surveys, experimental evaluation studies of this kind are usually 
less costly, but they remain inordinately extravagant in the use of re- 
search personnel time, and also involve more than the usual cooperation 
of operating officials, that is, commanding officers. Consequently, since 
the war, such studies of information media have been restricted te those 
of exceptional importance. One, currently under way, is an experimental 
evaluation of the new film produced under the auspices of the Surgeon 
General of the Army, entitled ‘Miracle of Living,” a film designed to 
produce certain changes in information, attitude, and behavior among 
enlisted men with respect to venereal disease. 


Evidence Pro and Con of a Proposed Policy or Change in Policy 


5. Virtually all Research Branch studies have been used or are poten- 
tially useful for the purpose of providing a'source of quantitative infor- 
mation and evidence in support of or against a proposed policy or a change 
- in policy. In contrast to arm-chair opinion based on umbilical medita- 
tions, quantitative evidence derived from scientific sampling surveys are 
invaluable tools in the hands of skillful administrators. Among in- 
stances which may be mentioned of this use of attitude research data are 
studies of officer-enlisted man relationships used by the Doolittle Board 
in preparing its recommendations, attitudes toward Army Courts-martial 
procedure, survey of educational and recreational interests of soldiers, 
surveys among hospital patients with respect to treatment, surveys for 
the Quartermaster General on soldiers’ food and clothing preferences, 





Altitude Research in the Army 5 


comparison of competing physical training programs, survey among 
medical officers with respect to reasons why they would or would not 
accept commissions in the Regular Army, housing demands both among 
men in the Army and those about to be discharged, attitudes of officers 
toward logistical careers and training programs, and other studies of a 
more confidential nature. In short, as General; Lanham has phrased it, 
attitude research has, within small and useful margins of error, proved 
itself to be the “‘morale radar’’ of the Armed Forces. 


Received June 18, 1948. 





The Psychological Barometer on Communism, 
Americanism and Socialism 


Henry C. Link and Albert D. Freiberg 
The Psychological Corporation, New York City 


The following results are taken from three Barometer surveys: the 
August, 1948 survey made with 5000 urban interviews; the October, 1948 
survey made with 1000 interviews but with a comparable sample; the 
November, 1948 survey made with 10,000 urban interviews. The dates 
and size of sample are given with each table. 

In the August Barometer of 5000 interviews, one of the questions 
asked was: 


Q. What, in your opinion, are the three most dangerous threats within our own 
country to a prosperous America? 


This question was asked specifically for the employee relations division 
of the General Electric Company. The answers showed that two threats, 
inflation and Communism, were considered by far the most dangerous, 
with strikes and industrial conflict a distant third. The per cents 
mentioning various dangers were: 


Threats to a Prosperous America 


Answers, August 1948 





Inflation, high prices 

Other economic threats such as a depression, 4.4%; 
high or low wages, 1.7%; O.P.A. or lack of O.P.A., 
.9%; miscellaneous, 1.8%; total 

Communism 

Fascism, .9%; Socialism, .5%; foreign spies and in- 
filtration, 1.1%; lack of freedom, .3%; total 

Strikes, struggle between capital and labor 

Power of unions, organized labor 

Taft-Hartley Act 

Politicians, political parties, politics 

War talk, threat of war 

Big business, monopolies, Wall Street, capitalism, 
high profits 

Race prejudice and intolerance 

Civil rights program, Jews, Negroes, immigrants 

Atomic bomb, 1.8%; inadequate military defense, 
.6%; draft, .4%; poor foreign policy, E.R.P., 
1.4%; the Russians, 1.6%; total 


6 





Communism, Americanism and Socialism 


Social and psychological threats: 
Lack of housing, 4.2%; alcohol, drinking, 3.8%; 
crime, delinquency, 3.6%; lack of religion, 3%; 
family trouble, 1.9%; poor education, 1.9%; 
movies, theatres, radios, comic books, .6%; lack 
of cooperation, 3%; greed, 1.7%; misc. 5.8%; total 

Bad govt., bureaucracy, graft, govt. racketeers, govt. 
restrictions, govt. spending, high taxes; total 

Natural disasters including fire, floods, rodents, 
drought, wastefulness of resources 

Miscellaneous 

Don’t know 

Total Interviews 


Is Communism Becoming Dangerous? 


The growing danger of Communism in the United States is further 
indicated by the answers in 1946 and in 1948 to this question: 


Q. It is being said that Communism is becoming a dangerous thing in the United 
States. Do you think this is true or not? 


Answers April 1946 October 1948 





% 7 
True 51.2 67.0 


Not true 34.1 24.5 
Don’t know 14.7 8.5 


Total Interviews 2500 1000 


This conviction is shared pretty much by all socio-economic groups, 


and by union and non-union families alike, as shown by the following 
table: 


Union Membership 
Socio-Economic Group 








Non- 
Answers, Oct. 1948 A B Cc D Union Union 





% T % % To % 
True 80 66 67 62 65 68 


Not true 14 26 27 24 29 23 
Don’t know 6 8 6 14 6 9 


Total Interviews 100 300 400 200 278 722 


Are Communists Traitors? 


In previous surveys, it was found that Communists in the United States 
were regarded by 77 per cent to be a fifth column, loyal to Russia first, 
rather than as a typical American political party. A majority favored 


outlawing the Communist party. The sharpest definition of this issue 
was made in the question: 





Henry C. Link and Albert D. Freiberg 


Q. Do you think a Communist is a traitor to the United States? 
Answers January 1948 October 1948 





% % 
Yes 65 70.6 


No 18 18.0 
Don’t know 17 11.4 


Total Interviews 600 2500 


Union and non-union members thought alike on this subject, whereas, 
by socio-economic groups, the “‘yes’” answers ranged from 81 per cent in 
the “A” group to 62 per cent in the “D” group. 


Is Socialism Becoming Dangerous? 


Whereas Communism in the last two years has been sharply recognized 
by the American people as a threat to their institutions, their reactions to 
Socialism are quite different. « Where 67 per cent say that Communism is 


becoming dangerous, only 26 per cent say that Socialism is becoming 
dangerous. 


Q. It is being said that Socialism is becoming a dangerous thing in the United States. 
Do you think this is true or not? 


Union Membership 





Socio-Economic Group 


Non- 
Answers, Oct. 1948 Total A B UClUCUC@DD Union Union 








% % TN ? % 
True 26.4 30 «634 21 
Not true 50.5 54 «50 53 
Don’t know 23.1 16 16 26 


Total Interviews 1000 100 300 400 278 


Are Communism and Socialism the Same? 


Because of these widely different reactions toward Communism and 
Socialism, this further question was asked: 


Q. Do you think that Socialism and Communism are about the same or are they 
different? 


Union Membership 
Socio-Economic Group 


Non- 
Answers, Oct. 1948 Total A B Cc D Union Union 











% T% % MN % % 
Same ; 16 21 2 22 28 21 
Different 73 66 59 53 57 62 
Don’t know (| ee) ee 15 17 
Total Interviews 100 300 400 278 722 





Communism, Americanism and Socialism i) 


Union members are more likely to regard them as the same than are 
non-union members, but the higher the educational level, the more likely 
people are to regard them as different. In answer to the question: 

Q. What difference do you think there is between them? 


Some of the principal reasons given were: Communism is totalitarian 
while Socialism isn’t; Socialism recognizes individual rights; Socialism is 
more liberal, more democratic; Communism means force, Socialism does 
not; Socialism is gradual, Communism is revolutionary; Communism is 
bad, Socialism is good, etc., ete. However, 39 per cent gave no answer. 


Specific Issues on Communism and Socialism 


The sharp repudiation of Communism as compared with Socialism is 
no doubt influenced by the strained relations between Russia and the 
United States. Therefore, it is of unusual significance to ascertain 
people’s reactions to specific measures which t7nd to bring about Social- 
ism or Communism, or both, in this country. In the previous survey,! 
we reported on such issues as government versus private ownership of 
manufacturing companies, who does the most for the good of the workers, 
preference for jobs in private industry or the government, and investing 
money in government bonds or private concerns. 

One of the questions asked in the October survey was: 


Q. Do you think government control of business would be a step toward Com- 
munism? Toward Socialism? 


Toward Toward 
Answers, Oct. 1948 Communism Socialism 





/0 % 
Yes 61.3 49.7 
No 22.5 18.9 
Don’t know 16.2 31.4 


Total Interviews 1000 1000 


The answers by union membership and socio-economic group to these 
two questions were: 
Q. Do you think government control of business would be a step toward Commun- 
ism? 


Union Membership 
Socio-Economic Group 








Non- 


Answers, Oct. 1948 A B Cc D Union Union 





Yes 64 65 63 «50 59 62 
No 128283 fF 23 23 
Don’t know 15 12 14 2 18 15 


Total Interviews 100 300 400 200 27 722 


1 Link, H. C. and Freiberg, A. D. The 97th psychological barometer. Journal of 
Applied Psychology, 1948, 32, 443-451. 








10 Henry C. Link and Albert D. Freiberg 


Q. Do you think government control of business would be a step toward Socialism? 


Union Membership 
Socio-Economic Group 


Non- 
Answers, Oct. 1948 A B Cc D Union Union 











% % GH %N % % 

Yes 63 58 48 35 41 53 
No 16 19 19 20 21 18 
Don’t know 21 23 33 £45 38 29 
Total Interviews 100 300 400 200 278 722 


Not inconsistent with the answers to the question on the differences 


bet ween Communism and Socialism were the answers to the following 
question: 


Q. Do you think a country can have democracy without having private capitalism? 
Union Membership 





Socio-Economic Group 


Non- 
Answers, Oct. 1948 Total A B Cc D Union Union 


% % % MW N % 
Yes 20.9 22 19 21 24 24 








% 
20 


No 57.4 61 65 58 42 50 60 . 
26 20 
722 


More than 42 per cent are either uncertain or say that private capital- 


Don’t know 31.7 17 16 21 34 
Total Interviews 1000 100 300 400 200 278 


ism is not necessary for democracy. This is especially interesting in view 
of the recent statements by Dwight D. Eisenhower, in his installation 
address as President of Columbia University and other talks, to the effect 


that private property rights in the United States are the keystone of all 
other democratic freedoms. 


Price Control and the O.P.A. 


The readiness of the people to accept socialistic controls, or govern- 
mental controls which amount to the confiscation of property, is illus- 
trated by the answers to this question: 


Q. What do you think would do most to keep prices down: the O.P.A. and its price 
ceilings, or free competition by business without any O.P.A.? 


Union Membership — 
Socio-Economic Group 


Non- 
Answers, November 1948 Total A B C D Union Union 











% % TT % % % 
O.P.A. and its price 
ceilings 35 43 52 49 38 
Competition by business 
without any O.P.A. 
Don’t know 
Total Interviews 


43 30 48 
13 18 14 





Communism, Americanism and Socialism 11 


The opinions of people on price control have been subject to very 
wide fluctuations. In the spring of 1946, all polls showed a large majority 
of the public favoring the O.P.A. By the fall of 1946, this attitude had 


almost completely reversed itself. The results of our polls on this sub- 
ject are: 


Oct. Aug. Nov. 
Answers 1946 1948 1948 


% % /0 
O.P.A. and its price ceilings 26.1 47.2 41.5 


Competition by business 
without any O.P.A. 65.1 39.7 44.5 
Don’t know 8.8 13.1 14.0 


Total Interviews 2500 5000 5000 





Socialistic Trends in Housing 


A further illustration of people’s readiness to accept socialistic meas- 
ures is provided by their answers to this question on housing: 


Q. How do you think the housing problem will be settled best: (a) by having the 


Federal Government furnish the money and plans, or (b) by leaving it to private in- 
dividuals and builders? 


Union Membership 





Socio-Economic Group 





Non- 


Answers, Oct. 1948 Total A » & 2D Union Union 





% % % T N % % 
Having Federal Govt. 
furnish money and plans 37.0 28 30 39 48 44 34 
Leaving it to private 
builders and individuals 51.8 64 59 51 36 44 55 
Don’t know 11.2 8 11 10 16 12 11 
Total Interviews 1000 100 300 400 200 278 722 





Other issues bearing on the conflict between Communism, Socialism, 
and traditional Americanism or a democracy based on private capitalism 
will be taken up from time to time. 


Attitude Toward the Taft-Hartley Law 


The feeling against the Taft-Hartley law among union members or 
union families is not nearly as unanimous as union leaders have pre- 
sented it to be. Of those questioned, 94 per cent answered “‘yes’’ to the 
question: Have you heard of the Taft-Hartley law which was passed by 


Congress to regulate unions, control strikes and get rid of Communist 
leaders? Then we asked: 





12 Henry C. Link and Albert D. Freiberg 


Q. During the past year do you think this law has done more harm than good or 
more good than harm? 


Union Membership 





Socio-Economic Group 


Non- 
Answers, Oct. 1948 Total A B Cc D Union Union 








7o % T T Xe % % 
More harm than good 24.8 15 23 29 34 21 


More good than harm 39.7 60 50 34 29 44 
Don’t know 35.5 23 27 37 37 35 
Total Interviews 1000 100 300 400 278 722 


The Chief Victims of the Increase in the Cost of Living 


The answers to this question show one of the sharpest differences by 
socio-economic groups that we have ever recorded. 


Q. Who has suffered most from the increase in the cost of living: the workers on 
salaries and wages, or the people who must live on the income from life insurance, 
Government bonds, stocks and other savings? 

Socio-Economic Group 
Answers, Oct., 1948 A B 





Cc D 





oF GF o% oF 
/0 /O /0 /0 


Workers on salaries and wages 25 27 38 52 


People who must live on income from 
life insurance, etc. 69 69 57 33 
Don’t know 6.9 6 + 5 15 


Total Interviews 1000 100 300 400 200 


Family Prosperity 
Q. Is your family more prosperous (or better off) today than two years ago, less 
prosperous, or the same? 


In spite of high prices, most families continue to think of themselves 
as better off or as well off as they were two years ago. 


Union Membership 
Socio-Economic Group 


Non- 
Answers, November, 1948 Total A B C D Union Union 











% % TW % % % 
More prosperous 24.2 23 25 23 25 25 24 
The same 45.8 49 46 46 44 43 47 
Less prosperous 26.0 25 25 27 27 29 25 
Uncertain 4.0 3 4 4 4 3 4 


Total Interviews 5000 500 1500 2000 1000 1438 3562 


The above figures show a rather significant difference between the 
opinions of union members and non-union members. Although the 





Communism, Americanism and Socialism 13 


unions are organized to obtain quick and broad wage increases, union 
members do not consider themselves as well off in the scale of living as do 
non-union members who have had to rely on themselves. Contrary to 
the popular belief that the white collar workers are the principal losers 
from the cost of living rise, this group, principally the ““B” group, con- 
siders itself better off than does the large group of skilled and 
semi-skilled wage workers where unionism is strongest (groups ‘‘C’”’ and 
“D”’). This may be due in part to the steadiness of their work as com- 
pared with the time lost by wage earners through strikes, material 
shortages and the indirect results of strikes in related industries. 

We have now been asking this question for several years and some of 
the results are as follows: 


Oct. Oct. Oct. Apr. Oct. Apr. Oct. Nov. 
Answers 1941 1943 1945 1946 1946 1947 1947 1948 





% % % % % % % % 
More prosperous 29 - 32 26 31 29 24 24 
The same 46 51 48 44 42 46 46 
Less prosperous 23 15 24 22 26 28 26 
Don’t know 2 2 2 3 3 2 4 


Total Interviews 2000 2500 2500 2500 2500 2500 2500 5000 


Probability of Another War 


The prospects of avoiding war, in people’s opinion, have improved 
during the past year, as shown by the October, 1948 survey. The ques- 
tion was: 


Q. Do you think we can make a lasting peace or do you think that there will be 
another war within the next 20 years or so? 


Feb. Oct. Oct. Oct. Oct. Oct. 
Answers 1943 1944 1945 1946 1947 1948 





% % % % % % 
Lasting peace 47 28 28 18 11 20 
Another war within 20 years 43 54 59 74 77 69 
Don’t know 10 18 13 8 12 11 

Total Interviews 2500 2500 2500 2500 2500 1000 


Another question on this same subject was: 


Q. How about the next three or four years: another war or no war? 
Answers, October 1948 


oO, 
/0 





War 35.3 
No war 42.8 
Don’t know 21.9 

Total Interviews 1000 











Henry _C. Link and Albert D. Freiberg 


The Civil Rights Issue 
In view of the great controversy over the civil rights program, the 
following question was asked with interesting results: 


Q. Which would do more good for American Negroes: (a) passing laws to give them 
equal rights with whites; (b) a program to teach white and Negro to get along together? 


Socio-Economic Geographic Area 
Group 





Mid- Far 
Answers, October, 1948 Total B C D_ East West South West 








Oo oF GF oO 
70 To 7o To 6% % 


Passing laws for equal rights 11.7 12 18 12 13 
A program to teach whites 
and Negroes to get along E 80 67 77 78 
Neither - cs 2 1 ° 
Don’t know 96 11 8 9 18 10 9 
Total Interviews 1000 100 300 400 200 370 315 
* Less than .5% 


Explanation of the Surveys 


Each of these surveys was made with a true cross-section of the urban 
population. The August and November surveys were made in 100 cities 
and towns; the October survey was made in 47 cities and towns. 

Sampling Methods. A modified area sampling method was used. 
All interviews were assigned by the local supervising psychologist by 
blocks and streets in accordance with maps constructed to designate the 
proper socio-economic levels. These maps are made to divide the popula- 
tion into four principal groups, the ‘‘A” group consisting primarily of 
owners and executives, the “B” group, primarily white-collar and semi- 
professional, the ‘‘C”’ group or skilled factory and transportation workers, 
“D” group or the less skilled. All interviews were made in the home, but 
only one in a family; half were made with women, half with men. 


Received December 13, 1948. 
Early publication. 





An Analysis of the 1948 Polling Predictions * 


Daniel Katz 
Survey Research Center, University of Michigan 


After naming the winning candidate successfully in three presidential 
elections, the public opinion polls stumbled badly in 1948 in their unani- 
mous forecast of a Dewey victory. With the exception of the Roper 
poll, however, the 1948 performance, from an arithmetic point of view, 
was not as startlingly different from previous forecasts as might be 
supposed. In 1936 Gallup underestimated the Democratic Party vote 
by 6.9 percentage points, in 1940 by 2.5 points, in 1944 by 1.5 points, 
and in 1948 by 5.0 points. Similarly, Crossley underestimated the Demo- 
cratic presidential vote by 6.9 per cent in 1936, by 1.6 in 1944 and by 
4.7 in 1948. On the other hand, Roper, who had never missed a presi- 
dential election by more than one percentage point and who had been 
within 0.2 of the Roosevelt vote in 1940 and 1944, had the largest error 
of all in 1948 with an underestimate of the Truman vote of 12.4 per- 
centage points (see Table 1). 


Table 1 


National Popular Vote * and Predictions 








Error in Percentage 
Actual Predictions Points 








Gallup Crossley Roper Gallup Crossley Roper 


Truman 49.5 44.5 44.8 37.1 —5.0 ! —12.4 
Dewey 45.1 49.5 49.9 52.2 4.4 ‘ 7.1 
Wallace 2.4 4.0 3.3 4.3 1.6 j 1.9 
Thurmond 2.4 2.0 1.6 5.2 —0.4 y 2.8 
Others 0.6 0.0 0.4 i3 —0.6 : 0.6 











100.0% 100.0% 100.0% 100.0% 





* Based upon the final figures compiled by the Associated Press. 


Both the polling predictions and the general picture in the public 
press were misleading, not only in their forecast of a Dewey victory, but 
in their analysis of the voting trends. The percentage of the votes cast 
for the Republicans was remarkably close to the percentage achieved 
in 1944. Governor Dewey polled 45.8 per cent of the national vote in 

* The Editor solicited this paper and gives it priority in publication because of its 
importance and timeliness. Only rarely would a situation arise which would justify 


such special treatment. 
15 








16 Daniel Katz 


1944 and 45.1 per cent in 1948. The reason why the 1948 election was 
close was not that there had been a gain in the Republican vote, but that 
there were defections from the Democratic vote to Governor Thurmond 
and Henry Wallace. Though the national percentage total for Dewey 
remained constant, there were interesting shifts in the sectional support 
he received in the two elections. The Republican candidate made slight 
gains in the industrial east and on the Pacific Coast, but suffered real 
losses in the west-central states, namely in Iowa, Kansas, Minnesota, 
Missouri, Nebraska, South Dakota and Wisconsin. Neither the polls 
nor the newspapers detected this very significant reversal of national 
voting behavior in which Truman carried a number of the farm states. 


State-by-State Errors of the Polls 


In their predictions of the specific states the polls almost doubled 
their average state error of 1940 and 1944. They were not as far off as 
in 1936, save that their error this time was one of sign as well as magni- 
tude, i.e. they missed the winning candidate. Crossley’s average state 
error of 4.4 was almost a percentage point better than Gallup’s. The 
Crossley poll missed 11 of the 48 states by six percentage points or more 
as against 16 similar misses by the American Institute of Public Opinion. 
It is significant that most of Gallup’s large errors were in states where the 
Republicans lost votes from 1944 to 1948. Where the Republicans made 
gains Gallup’s errors tended to be smaller. In other words, the Gallup 
prediction was that of a general increase, fairly evenly distributed over 
the nation, rather than a differential increase in certain states. This 
means that no simple correction for Gallup’s inflation of the Republican 
vote on a state-to-state basis would have remedied his inaccuracies. 
Table 2 presents the state-by-state errors of the Gallup and Crossley 
polls. Roper made no state estimates. 


General Reasons for the Failure of the Polls 


To the world of applied research the poor predictive performance of 
the polls was as much of an upset as the election of President Truman 
was to the newspaper world. Yet from a scientific point of view there 
was evidence, before November 1948, that the polls could not continue 
their successful record without a change in basic methodological approach 
as well as in specific techniques. The general philosophy of the pollsters 
was one of rule-of-thumb procedure rather than sound theory and method. 
What had worked in the past was accepted at face value without an 
analysis of why it had worked nor an analysis of the conditions under 
which it had worked. Moreover, their specific techniques of sampling, of 
‘interviewing, of research design were known to have serious weaknesses. 

The pollsters began in 1936 with an improvement upon the Literary 





Table 2 
State-by-State Errors of Gallup and Crossley 








% of Major Error in Percentage Points 
Party Vote 
for Truman * Gallup Crossley 








Alabama ** 

Arizona 53.8 0.8 + 1.2 
Arkansas 61.7 8.7 + 13 
California 47.6 4.6 — 3.6 
Colorado 51.9 2.9 2.9 
Connecticut 48.4 4.4 8.4 
Delaware 48.8 1.8 0.8 
Florida 48.8 3.8 0.8 
Georgia 60.8 2.8 1.2 
Idaho 50.0 3.0 5.0 
Illinois 50.1 4.1 7.1 
Indiana 48.4 4.4 4.4 
Iowa 50.3 7.3 —10.3 
Kansas 44.6 5.6 2.6 
Kentucky 56.7 7.7 3.7 
Louisiana 32.8 6.2 4.8 
Maine 42.3 0.3 3.3 
Maryland 48.0 4.0 2.0 
Massachusetts 54.7 9.7 7.7 
Michigan 47.6 3.6 0.6 
Minnesota 57.2 11.2 9.2 
Mississippi 9.8 5.2 8.2 
Missouri 58.1 6.1 3.1 
Montana 53.1 3.1 4.1 
Nebraska 45.8 78 3.8 
Nevada 50.4 3.4 2.4 
New Hampshire 46.7 2.7 5.7 
New Jersey 45.9 3.9 4.9 
New Mexico 56.4 5.4 4.4 
New York 45.0 6.0 3.0 
North Carolina 58.0 7.0 1.0 
North Dakota 43.4 5.4 4.4 
Ohio 49.5 7.5 4.5 
Oklahoma 62.7 7.7 4.7 
Oregon 46.4 4.4 4.4 
Pennsylvania 46.9 2.9 4.9 
Rhode Island 57.8 3.8 4.8 
South Carolina 24.1 13.9 4.9 
South Dakota 47.0 6.0 —10.0 
Tennessee 49.1 1.9 1.1 
Texas 65.4 0.6 0.6 
Utah 54.0 C 6.0 
Vermont 36.9 § 5.9 
Virginia 47.9 § 1.9 
Washington 52.6 ’ 6.6 
West Virginia 57.3 , 7.3 
Wisconsin 50.7 A 7.7 
Wyoming 51.6 F 5.6 


Average State Error : 4.4 








* Final figures compiled by the Associated Press and reported by the New York 
Times December 11, 1948. 
** Truman not on the ballot. 


17 








18 Daniel Katz 


Digest biased method of sampling. Since 1936 they made some minor 
improvements and learned either to take advantage of their compen- 
sating errors or to correct for their biases, but they never made major 
advances in methodology. Why, then, did they do so well in 1940 and 
1944 with their methods and techniques and so poorly in 1948? The 
two main reasons seem to be: (1) Their experience with techniques and 
corrections in Roosevelt elections. With a change in the political scene 
their procedures no longer functioned effectively. Thus Gallup and 
Crossley started in 1936 with an error of 6.9 percentage points, improved 
their performance in subsequent Roosevelt elections, but moved back 
toward their original starting point when they attempted a presidential 
election in which different factors were operative; (2) The Roosevelt 
elections were highly structured situations in which the dominant per- 
sonality of Roosevelt crystallized attitudes and opinions. With this 
definite bipolarity of attitude it was not difficult, even with poor tech- 
niques, to make election predictions. 

Moreover, the polls have never adequately examined the nature of 
the problem of prediction. In basic science, predictions are made not for 
an open system of events but in terms of contingent conditions. In 
applied science, the engineer or the weather forecaster makes some 
estimates of the possible determinants of the process or event he is 
attempting to predict. Similarly in attempting to make predictions 
about social behavior, the social scientist must take into account the 
relevant field of forces. He cannot merely single out a behavioral or 
attitudinal trend and predict its repetition. Yet this is essentially what 
the pollsters attempt to do. They reproduce the national election in 
miniature and assume that the final election will be a repetition of the 
trend they have measured without recourse to the many determinants 
of voting behavior. 

It should be emphasized at the start that their fundamental mistake is 
not to be found so much in any one technique, such as quota sampling or 
fixed-alternative questions, as in poor research design. In basic science 
and in applied science we attempt to measure the relationship between 
two variables and seek to establish causal connections. We do this, 
moreover, at some level of generality beyond the specific content of one 
particular situation so that we can build up generalizations which apply 
to the same type of social process. This means that we do more than 
report the given percentage of people who favor the Marshall Plan or say 
that they will vote for President Truman. This means, moreover, that 
we must conceptualize and identify the important variables and obtain 
systematic measures of them. 

If we apply this logic of research design to election prediction, we 
need to set up a number of studies designed to measure the determinants 





Analysis of 1948 Polling Predictions 19 


of voting behavior or turn-out and the causal conditions affecting political 
conviction. It is not enough to have some rough measure of background 
variables such as income level, or amount of schooling, or even union 
membership. We need some picture, in addition, of the intervening 
variables which will give us the perceptions and attitudes related to 
political parties and political party candidates. How much of this can 
be done by public opinion polls is a debatable question, but it is scarcely 
in their best interests to continue to lag behind the advances made by 
psychologists and social scientists in their studies of human behavior. 
These points have all been made before the 1948 polling debacle and can 
be found in the writings of A. Campbell, D. Cartwright, R. Crutchfield, 
D. Krech and the present writer.’ 


Sources of Error in the 1948 Polls 


It will never be possible to make a precise assessment of the contribu- 
tion of every factor to the error of the polls. Since the polls did not set 
up adequate hypotheses about voting behavior and political preferences 
during the campaign, the data are not now available for analysis. It is 
not even possible to go back and reinterview the same respondents 
sampled by tiie polls because the polls did not take names or addresses. 
There are some limited panel studies where this is being done and they 
will throw some light upon the problem. Gallup did ask a sample of 
respondents to return postcards after the election to indicate how they 
voted, but the selective bias in a mail-return makes these data hazardous 
to interpret. 

It is usually assumed that the important sources of error, however, 
are to be found in: (1) differential turn-out; (2) the undecided voter; (3) 
the changing voter; and (4) the representativeness of the sample. 


Differential Turn-out 


Australia is the pollsters’ Utopia, for in Australia the law requires 
all citizens to vote. It must be remembered that in our country fore- 
casting an election involves two predictions: an estimate of how voters 
feel about the candidates and an estimate of which voters will go to 
the polls on election day. In general the polling organizations make 
no systematic correction for turn-out but depend upon their educational 
bias in sampling for the major adjustment. 

1A. Campbell. Polling, open interviewing, and the problem of interpretation. J. 
Soc. Issues, 1946, 2, 67-71; D. Cartwright. Review of G. Gallup’s A guide to public 
opinion polls. J. consult. Psychol., 1945, 9, 201-202; R. Crutchfield and D. Krech, 
Theory and problems of social psychology. New York: McGraw-Hill, 1945; D. Katz. 
Survey technique and polling procedure as methods in social science. J. Soc. Issues, 


1946, 2, 33-44; and D. Katz. The interpretation of survey findings. J. Soc. Issues. 
1946, 2, 62-66. 











20 Daniel Katz 


One explanation of both Dewey’s defeat and the pollsters’ failure is 
that the Republicans stayed away from the polls in greater numbers than 
they usually do, as compared to the customary voting behavior of the 
Democrats. The reasons marshalled to support this theory are varied 
and not too consistent. For example, the opinion polls defeated them- 
selves by making the Republicans overconfident and so less energetic 
about getting out the vote; or the Republicans were apathetic about 
their standard bearer; or the farmers were too busy getting in the harvest 
on election day to go to the polls. 

The hypothesis of Republican overconfidence, or indifference, in its 
effect upon turn-out, makes sense only if we assume that the polls were 
accurate in their original estimates about the wishes of the people. It 
can be argued more plausibly that the nature of the turn-out in 1948 
reduced rather than increased the prediction error. Neither party did a 
good job on turn-out in 1948. Many Democrats as well as Republicans 
stayed away from the polls. Against the overconfidence of the Repub- 
licans was the lack of motivation on the part of millions of Democrats 
who idolized Roosevelt and found Truman a weak substitute. Since 
there are considerably more people in the country who consider them- 
selves Democrats than consider themselves Republicans and since young 
people who come of voting age are more likely to favor the Democratic 
than the Republican ticket, the chances are that if the national turn-out 
had been as heavy in 1948 as in 1940, there would have been a Truman 
landslide and not a Dewey victory. The overconfidence hypothesis 
ignores the fact that party machines are organized on a local and state 
basis. Even though the Republicans thought the presidential election 
was in the bag, there were many Congressional, state and local offices in 
doubt, for which it was necessary to turn out the vote. And the states in 
which overconfidence should have been the highest according to this 
theory were the states where Dewey actually made gains as in Maine 
and Vermont. 

There is no proof that the upper-income Republican groups relative 
to lower-income groups failed to vote in greater numbers in 1948 than in 
the past. The figures in Table 3 show turn-out by economic groups for 
1948 and the heavy-voting year of 1940. 

The NORC survey in 1940 showed that the lowest income group 
stayed away from the polls in a ratio of three to one compared to the 
highest income group. The Roper figures show an even higher ratio 
in 1948 in favor of greater turn-out among the upper-income people. It 
should be stated, however, that the comparability of these figures leaves 
much to be desired. They were obtained by two different organizations 
and the income groupings may vary considerably. They are suggestive, 
however, in their implication that the 1948 turnout actually favored 





Analysis of 1948 Polling Predictions 21 


the Republicans. The same inference was made before the election when 
experts asserted that a turn-out of under 49,000,000 would help the 
Republicans. 

Table 3 


Turn-out by Economic Groups in 1948 and in 1940 








Did Not Vote 


Economic 1948 Post-election 1940 Post-election 
Group Pol) by Roper Survey by NORC 


A 11.3% 16.0% 
B 146 

Cc 26.7 33.0 
D 


40.6 47.0 











Similar interpretations come from a study of the farm and city vote. 
If the Truman victory were a matter of differential turn-out, then we 
would expect bigger Democratic majorities than usual in the industrial 
centers where the unions and Democratic machines are entrenched. 
But this was not the case. Truman lost a number of industrial eastern 
states and ran surprisingly well in the farm belt and in rural districts. 
Preliminary analysis of rural and urban counties corroborates the na- 
tional trend. Dewey lost not because the Republican farmers stayed 
away from the polls but because many of them voted for Truman. 

Though turn-out does not seem to be the explanation for the diffi- 
culties of the pollsters, it is essential in future research that attempts be 
made to measure, or take into account more thoroughly, the factors 
which affect turn-out. Certainly much more can be done to get at the 
spontaneous forces within voters which get them to the polls on election 
day. Crossley has made a start on this problem with questions on 
voting intention and certainty of voting but in addition we need to 
study the potency of the individual’s involvement in both the national 
and local elections, the importance the individual attaches to his own vote, 
and his feeling of responsibility toward voting participation in the 
democratic process. The external factors are more difficult to get at 
but unless we know something about the relative strength of political 
machines in various states and the pressures of the individual’s own social 
group, we are handicapped in making predictions. 


The Undecided Voter 


A larger proportion of people than usual could not, or would not, 
tell interviewers how they were going to vote on election day. The 
Roper survey in August, 1948 found 15.4 per cent of the people undecided 
and Gallup and Crossley still had about 8 per cent undecided in October. 











Ad mtn erence! 





22 Daniel Katz 


The polling predictions were computed with the undecided group omitted 
on the assumption that these people, to the extent they voted, would 
distribute themselves among the presidential candidates in the same 
proportions as the decided voters. 

The mistake in this assumption was that the great majority of the 
undecided were not at a mid-point between the two major candidates. 
Many people were undecided between Truman and the minor party 
candidates or between Truman and not voting at all. There is direct 
and indirect evidence that the undecided vote went more heavily to 
Truman than to Dewey. 

The Survey Research Center of the University of Michigan asked 
people about their voting intentions in an October study which was being 
conducted for another purpose than election prediction. The question 
was asked to get a measure of political identification for correlational 
analysis. Since names and addresses were available the same panel was 
re-interviewed after the election and queried about their actual voting 
behavior. The people who originally said they did not know how they 
would vote, now reported that they voted for Truman in a ratio of two 
to one. Fewer of this undecided group reported that they voted than 
of the decided group. Though the national sample was small, the results 
are consistent with other findings. Similar evidence will be available 
from other panel studies. 

The indirect evidence comes from an examination of the undecided 
group in pre-election surveys. Roper’s results show that many more of 
the people, who did not know how they would vote, considered themselves 
Democrats than considered themselves Republicans. Roper also asked 
about such issues as rent control, social security measures, and the Taft- 
Hartley act. The undecided group were not consistent in their re- 
sponses, but on the whole, they resembled Truman supporters more than 
Dewey supporters. 

The undecided voter was thus one source of the polling error in 
prediction. But because the undecided group was after all a minority 
and because they did not turn out to vote as much as the decided group, 
they could not have contributed more than about one per cent of the 
five per cent prediction-error. 

The failure of the polls to study the undecided vote illustrates the 
lack of research design in their methods. It would have been possible 
to have set up systematic hypotheses about this group and explored the 
nature of their indecision, the reasons for their indecision, their basic 
political philosophy, etc. The Roper poll had some data on the un- 
decided group but it made no real use of the information it had. In 
the past the undecided vote was not a problem in election prediction 
and even in 1948 it may not have been a major factor. Nonetheless it 








Analysis of 1948 Polling Predictions 23 


may loom larger in future elections. But more important is the con- 
sideration that it is related in its psychological dimensions to the problem 
of the changing voter. 


The Changing Voter ~. 


Another source of polling error was the fact that some people told the 
interviewer one thing and then behaved differently on election day. 
This distortion is twofold in nature. Part of the problem is a matter of 
interviewing skill and technique, in that people may give what seems 
like a socially acceptable answer in the interviewing situation. If Dewey 
is supposed to be the popular candidate, if his is the name they ordinarily 
hear in everyday conversation as the assured winner, and if they have 
some doubt about Truman, people may find it easier to say “Dewey” 
when asked the direct question about voting preference. There is no 
documentation for this possible source of error in the last election but 
it suggests the importance of thorough interviewing and real training 
of interviewers. 

The second part of the problem concerns genuine psychological change. 
In a difficult choice-situation some people may give one response to an 
interviewer but when confronted with the reality of the election booth 
they may change their minds.? Take, for example, the supporter of the 
New Deal, dissatisfied with Truman, who says before election he will 
vote for any candidate save Truman. When the chips are down, how- 
ever, he returns to the party most representative of his beliefs. Another 
type of change is typified by the farmer who originally planned to vote 
for Dewey, became alarmed at the fall in farm prices and the Republican 
position on the support of farm prices, and voted in terms of what seemed 
to him his best self-interest. 

Panel studies give some support to the changing voter as a source 
of polling error. More people who said they would vote Republican, 
after the election report they voted Democratic than report the reverse. 
It is not possible to estimate precisely how much this was responsible 
for the prediction failure. In a post-election survey (see Table 4), 
Gallup found that in general Dewey voters had made up their minds 
earlier in the campaign than Truman voters. 

These findings indicate that the 1948 political situation had a different 
psychological structure for many people than the Roosevelt elections. 
In the final analysis most people may have voted for the party which 
represented their welfare as they saw it. But they did not crystallize 
their beliefs until they had to. They finally reached a decision con- 
sonant with their basic attitudes. This may be why so many people 
who talked against Truman were so delighted with the election returns. 


2 This hypothesis about the changing voter was suggested by R. Crutchfield. 








Daniel Katz 


Table 4 
Gallup Post-election Survey of Time Voters Made Up Their Minds 








Definitely Made Up 
Their Minds Truman Dewey All voters 





Before campaign started 46% 64% 54% 
Early in campaign 11 12 12 
First half, Oct. 

Second half, Oct. 13 

Election day 

Indefinite 





The lesson for election prediction, presented by the undecided and 
changing voter, is not primarily the necessity of polling until the last 
moment. Trend studies must be made, but adequate research in this 
field should be more than the projection of a single attitudinal trend. 
The polls can interview 48 hours before the election and still miss the 
voter who reacts differently to the reality of the election booth than to 
the straw ballot. The real lesson is that the determinants of political 
behavior must be systematically explored. We need to study how the 
voter perceives political parties and candidates; for example, to what 
extent is he politically-minded in viewing a candidate and a party as an 
instrument for protecting and improving his interests, to what extent is 
he reacting to the personalities of the candidates, etc. We need, further- 
more, to investigate the basic social, economic, and political beliefs and 
their relative importance to him. 

To do thorough studies of this kind requires much more theoretical 
planning than the polls have thus far done. They occasionally ask 
questions on issues, but they have not systematically designed studies to 
give answers to problems of political motivation. These studies, whether 
conducted by the pollsters or by psychologists, are indispensable to the 
making of predictions. In addition to better research planning, the use 
of intensive interviewing, even on a pre-test basis, could get the signifi- 
cant frames of reference in which people are thinking. The usual polling 
pre-test is one of testing question-wording, not one of the experimental 
investigation of the dimensions of the problem under study. Lazarsfeld 
has pointed out how adequate pre-testing with intensive interviewing 
could be used to develop more valid ballots with pre-coded answers.® 


The Representativeness of the Sample 


To estimate turn-out, to allocate the undecided vote, to gauge the 
stability of voting preference all require good interviewing and research 


’P. Lazarsfeld. The controversy over detailed interviews. Publ. Opin. Quart., 
1944, 8, 38-60. 





Analysis of 1948 Polling Predictions 25 


design which goes beyond the direct question of voting intention 
into the related causal factors. In addition, however, there is the prob- 
lem of sampling, of obtaining a truly representative cross-section of the 
electorate. The quota-control method of the polls has been under fire for 
some time and since it is a more palpable weakness than lack of study 
design, the controversy over polling methods will focus unduly about it. 

The quota-control method sets up a cross-section which in theory 
represents the larger population proportionately in terms of sex, age, 
socio-economic status, urbanization, and geographical area. Inter- 
viewers are assigned quotas on this basis and told to bring back results 
from respondents of given characteristics. It is sometimes contended 
that the quota-control method is vulnerable because it does not stratify 
on some variable related to voting behavior such as union membership. 
This argument fails to get at the essential weakness of quota-control 
sampling. If the cross-section obtained by the quota method really 
achieved a random representation of the population according to the 
controls it employs, the chances are all in favor of other characteristics 
such as religion, occupation and even union membership being properly 
represented. 

The real defect of the quota-control method is in its execution. Since 
there are no strict controls over interviewers, they in fact select the 
sample. The result is not a random, or true probability sample. Inter- 
viewers are told to bring back results from so many respondents in the 
D, or below-average economic category. What constitutes a D re- 
spondent and how D respondents are to be selected is too much a matter 
of interviewer judgment. In practice interviewers filling their quotas 
take people who are physically and psychologically more accessible. As 
middle-class members themselves they under-represent the poorer people 
and to some extent the very wealthy. Since the wealthy are much less 
numerous, these are not compensating errors. Moreover, interviewers 
tend to get respondents more like themselves on other counts than would 
be found in a truly representative sample. 

The under-representation of the lower income groups is in evidence 
whenever a quota sample is broken against some measure indicative of 
socio-economic status such as education or telephone ownership. Un- 
corrected quota samples employing no special devices to limit inter- 
viewers traditionally find between 12 to 20 per cent too few people in 
the lower education brackets. 

In 1940 and 1944 Gallup and Crossley corrected indirectly for the 
quota bias by adjusting for past voting behavior. In 1948 Gallup also 
used the respondents’ answers to questions on education to correct his 
final sample. In spite of corrections Gallup and Crossley never suc- 
ceeded in the Roosevelt elections in eliminating their Republican over- 
estimates. 











26 Daniel Kaiz 


Roper does not employ corrections but stands by the raw data from 
his sample. In the Roosevelt elections he had the advantage of sizable 
compensating errors in that his southern sample was much too Demo- 
cratic and his northern sample much too Republican. In 1940, for ex- 
ample, Roper overestimated the Democratic vote in the East South 
Central states by 12.5 per cent and in the South Atlantic states by 8.3 
per cent. To balance this, however, he underestimated the Democratic 
strength in the West North Central states by 6.8 per cent and in the 
Mountain states by 10.5 per cent. In 1948 the candidacy of Governor 
Thurmond knocked this compensating error into a cocked hat. In some 
southern sections Roper interviewers found Truman and Thurmond 
tied. Without his usual southern overweighting of the Democratic vote, 
there was nothing to compensate for the northern Republican inflation 
and Roper after three highly accurate predictions was not even close in 
1948. If his figures are corrected for the educational bias in his quota 
sample, however, the Roper figures are very much like the Gallup and 
Crossley predictions. 

It is clear, then, that the largest part of the Roper error was due to 
the uncorrected quota-control method of sampling. It is not clear, 
however, how much of the remaining error in prediction (about five per 
cent) is due to poor sampling, which cannot be corrected for, and how 
much to non-sampling factors. Those who defend quota sampling admit 
that area, or probability, sampling would have given slightly greater 
accuracy but they dismiss sampling as a minor factor. Their logic on 
this score is interesting in that their final argument is that quota sampling 
costs less than area sampling. 

That poor sampling did contribute to the polling error in spite of the 
corrective adjustments made in the data seems a sound interpretation 
for these reasons: 


1. Even in the Roosevelt elections, with a highly structured situa- 
tion, with attitudes and opinions well crystallized before election day, 
Gallup and Crossley were not able to correct away their inflation of the 
Republican totals. In 1940 Gallup missed 16 states by three percentage 
points or more, but only one of these errors was in the direction of over- 
estimating the Democratic vote. In 1944 he was off the mark by three 
percentage points or more in 22 states. Only two of these errors were 
overpredictions of the Democratic vote. Similarly, Crossley had 13 state 
errors, in 1944, of three percentage points or greater, but only one of these 
favored the Democratic candidate. 

2. Corrective adjustments introduced into data to compensate for 
poor sampling are always limited by the poor sampling that was done in 
the first place. To inflate an under-represented group by some corrective 





Analysis of 1948 Polling Predictions 27 


weight does improve the whole sample to some extent relative to the 
neglected group, but it cannot insure the representativeness of this group. 
If, for example, we weight up the people with no better than grade school 
education by fifteen per cent, we still have not improved the character 
of the sample for this group even though we have improved its place in 
the sample. 

3. The area method of sampling was more accurate in the 1948 
elections than the quota sample but it was used in too limited a way to 
draw definite conclusions. The study of the University of Michigan 
Survey Research Center, previously referred to, used a national area 
sample and found an even division among the decided voters between 
Truman and Dewey. This evidence is limited in that the sample was 
small and in that the Center’s methods of interviewing also differ from 
polling methods. The Elmira study of Lazarsfeld, using an area sample, 
missed the vote in that town, however, by six per cent. Gallup’s quota 
sample for New York state was also six per cent in error. The clearest 
evidence is from the University of Washington Survey group which tried 
both area and quota sampling for the state of Washington. The area 
sample had an error of 2 percentage points; the quota sample an error 
of 7 percentage points. 


The interpretation of this evidence is obscured by the fact that neither 
method of sampling was followed according to its literal requirements. 
In the case of the quota method, the interviewers who used it were new 
to this method of sampling. They reported that they were unable to 
fill the lower income quota in 20 per cent of the cases. Whether this is 
the usual difficulty with the quota method which happened to be re- 
ported here because of the newness of the interviewers or whether this 
is unusually inadequate quota sampling is a matter of debate. The 
area sample also was not carried out perfectly and utilized liberal sub- 
stitutions. Nevertheless, the final figures show a superiority of the area 
sample over the quota method of five percentage points. 

Though it is unlikely that all of the prediction error was due to quota 
sampling, it may have contributed between one and three percentage 
points to the Gallup and Crossley underestimations of the Truman vote. 
If this estimate is correct, then area-sampling would have indicated a 
closer election and counselled caution on all-out predictions. 


Applied Psychology and the Polls 


The growing criticism of the polls in the field of applied psychology 
has already become more sharply focussed with the 1948 prediction 
failure. Criticism has been directed at two main phases of polling opera- 
tions: (1) the failure of the polls to keep abreast of technical and methodo- 

















28 Daniel Katz 


logical advances in pure and applied social psychology or to do methodo- 
logical research of their own; and (2) the reluctance of the polling agencies 
to make public their data and their procedures such as sample size, the 
exact corrective adjustments employed, etc. Both of these points were 
made by the technical committee of social scientists, serving the Con- 
gressional Committee which investigated Gallup in 1944. 

This criticism is undoubtedly justified but it should not lead to a 
blanket condemnation of the public opinion polls. They have made real 
contributions in the past in stimulating a quantitative and factual 
approach to problems once dealt with by journalistic reporting or arm- 
chair political science. They can make greater contributions in the 
future if they take stock of their methods. The 1948 setback should 
be of real value to them in that they may see that they have been ham- 
pered by a blind empiricism in the past. This empiricism led them to 
feel that what had apparently worked once or twice or even three times 
was somehow sacred and could be relied upon to work in the future no 
matter how conditions changed. 

Nor should the failure of the public opinion polls be construed as an 
indictment of all research in the field of consumer*needs and wants. 
There are many studies in this field to which the weaknesses of polling 
techniques do not apply. As in any new research, standards and methods 
in measuring consumer reaction vary considerably. It is of interest 
that some market research organizations, interested in sound methodo- 
logical development, had accepted true probability sampling before 
November 1948. 

Though social psychologists in general may want to work for im- 
provements in polling methods, it is important to distinguish between 
the polls and basic research in social psychology and the social sciences. 
Field work employing quantitative measurement on social psychological 
problems should not be confused with polling any more than the labora- 
tory work of the psychologist on problems of perception should be con- 
fused with market research. Though there are areas of overlap the 
tendency of the layman to confuse basic and applied research should not 
mislead the professional worker. This does not mean that the social 
scientist should be completely divorced from applied research. It is 
his responsibility to help formulate standards of research that will help 
both types of research. Such standards are needed in the public interest 
and by the polls themselves. They cannot afford a repetition of their 
1948 experience. 


Received December 21, 1948. 





Note on the Cardall Practical Judgment Test 


Dorothy H. Carrington 


Institute for Psychological Services, Illinois Institute of 
Technology, Chicago, Illinois 


The Cardall Practical Judgment Test (2) was given to over 300 un- 
selected men who had come for vocational guidance to the Illinois Institute 
of Technology. Their age range was from 16 to 63 years and the educa- 
tional level from 8th grade to persons holding the Ph.D. degree. All 
subjects also took: the Adams-Lepley Personal Audit Test (1); the ACE 
Psychological Examination (1942, college edition) (5); and the Otis 
Gamma (4). Scores on each of these tests were correlated with the 
Practical Judgment scores. The scores of the Practical Judgment Test 
were also correlated with age and education. 

The results are given in Table 1. 


Table 1. 


Correlations between Practical Judgment Test and Other Variables 








Standard Significance 
No. of Error of at: 
Variables Cases r 


1% 


ov 
<9 
4 





Age 361 i .0528 
Education 349 ‘ .051 
ACE Total 310 : .046 
ACE Quantitative 311 ; .053 
ACE Linguistic 307 26 .052 
Otis Gamma 344 ; .046 
Personal Audit 
Seriousness-Impulsiveness Scale 315 : .0563 
Firmness-Indecision Scale 316 : .054 
Tranquillity-Irritability Scale 315 d .0564 
Frankness-Evasiveness Scale 316 : .0559 
Stability-Instability Scale 313 ; .0567 
Tolerance-Intolerance Scale 314 d .0565 
Steadiness-Emotionality Scale 307 ‘ .055 
Persistence-Fluctuation Scale 309 j .0567 
Contentment-Worry Scale 309 7 .0568 
Split Half r corrected by Spearman 
Brown Formula 275 : .013 


NNDNN'Y 
TM 

DANNNN'Y 

a 77) 


RMN 


RD 


AZARARAYZALZL!Y! 
7) 





29 











30 Dorothy H. Carrington 


The split half reliability of the scores on the Practical Judgment Test 
in the sample used is .69 corrected by the Spearman Brown Formula. 
There is a low but statistically significant positive correlation between 
the Cardall Test and intelligence as measured. The correlation with 
formal education is also low but significant. For the most part, the 
Cardall Test does not correlate with the sub-parts of the personality 
test used although on Firmness-Indecision, Stability-Instability, and 
Steadiness-Emotionality there is a slight positive correlation which is 
significant. For the sample studied, the Cardall Test scores are in- 
dependent of age. 

Cardall correlated his test with the Army Alpha, Link’s Personality 
Quotient, Bell’s Adjustment Test and college grades and found no 
significant correlations. His correlation of the Practical Judgment and 
intelligence was —.05. This differs by 42 points from the results ob- 
tained in the present study. He does not give the number of persons in 
his sample nor the type of population on which they are based so it is 
difficult to tell what factors are causing the discrepancy. 

The data in this study differ from the data in Cardall’s Manual (2) 
in that in the Illinois Institute of Technology group, practical judgment 
as measured is not totally independent of intelligence and academic back- 
ground, and there is an indication that some personality factors influence 
test scores. According to the sample studied the reliability of the test 


is too low for the test to be used for individual predictions. 
Received July 15, 1948. 


References 


1. Adams, C. R. Manual of directions for the Personal Audit Test. Chicago: Science 
Research Associates, 1945. 
2. Cardall, A. J. Manual of directions for the Practical Judgment Test. Chicago: 
Science Research Associates, 1942. 
3. Guilford, J. P. Fundamental statistics in psychology and education. New York: 
McGraw-Hill Company, 1942. 
. Otis, A. S. Manual of directions for Otis Quick-Scoring Mental Ability Tests. 
Yonkers-on-Hudson: World Book Company, 1937. 
Thurstone, L. L., and Thurstone, Thelma G. Psychological examinations for college 
freshmen. Washington, D. C.: The American Council on Education, 1942. 





Originality Ratings of Department Store 
Display Department Personnel 


Catherine P. Dougan, Ethel Schiff and Livingston Welch 
Institute for Research in Clinical and Child Psychology, Hunter College 


In this study we attempt to measure creative thinking of the em- 
ployees in the display department of R. H. Macy’s, by means of the 
Welch Reorganization Test (1, 2) which obliges the subject to recombine 
familiar ideas according to four different patterns. It is Welch’s (1) 
assumption that the ability to recombine easily and reorganize ideas 
according to a specific plan is essential to all types of creative thinking. 
His contention is not that this is the only factor involved, but the indi- 
vidual lacking this ability will be seriously handicapped in an imaginative 
capacity. 

In two previous studies the Reorganization Test was given to 30 
professional artists, 25 art majors and 48 unselected students. We will 
compare the results of these investigations with those obtained in the 
present study. 


Procedure 


1. The Reorganization Test. The test is divided into four parts. The first 
three sub-tests make use of written material and the fourth makes use of blocks. 
The total testing time is 26 minutes. 


Part 1. Instructions 


Recombine the words of each group on the next page to make as many 
meaningful grammatical sentences as possible. For example, here is a group 
of ten words, 


MEN SKY IS FIGHT THAT THE SLOW BRIGHT OF FOR 


which can be recombined in the following sentences: 


Men fight for the sky. 
The sky is bright. 
The fight is slow. Ete. 


You will receive as much credit for a short sentence as foralong one. Your 
sentences do not have to be artistic, but they must be grammatical. There 
must be at least a subject and a predicate. You will receive credit for a sen- 
tence which is only slightly different from another. A word from the group 
can be used only once in the same sentence, but it may be used any number of 
times in other sentences. Only use words from the group that you are examin- 
ing at the time. You may skip from one group to another, if you like. 


31 








32 C. P. Dougan, E. Schiff and L. Welch 


There are ten of these groups and you have only ten minutes in which to 
complete the test. Are there any questions? . . . Do not turn the page until 
the examiner says “Start.” 

The following are the ten groups of Part 1: 


. Dog tree climbs runs those a smooth good by with 

City John built stood a that large strong of from 

. Car fence travels was this that big cool for by 

Sea woman move could these the green rough with of 

. Den lion ate is big deep these the of by 

. House child left has blue frightened the a for by 

Lemon wife cooks finds that soft round with from 

. Potatoes maid cut once small hot these a of for 

. Fish boy waits catches the a long cold by from 

. Slowly the golden light that rested upon them moved away 


SOWMNM ORO 


_ 


Part 2. Instructions 


Make as many letters as possible using no more and no less than three 
straight lines. For example, the letter A is made with three straight lines, two 
slanting downward and one across. You will be given no credit for the letter 
A, since it is an example. ; 

Make as many letters as possible, using no more and no less than two 
straight lines. 

Make as many letters as possible, using no more than one straight line and 
one semi-circle. 

The time limit is three minutes. 


Part 3. Instructions 


On the next page you will be given a list of twenty words which you are to 
connect into a story. You must be certain to use the words in the order in 
which they appear on the list. If the first word is “tree” in your story this 
must be the first word which appeared on the list. You must not skip any of 
the words. 

Your story must be grammatical and logically related. It must have a 
beginning and an end. You will be rated on the number of words you make 
use of in the time allotted. Write as fast as you can and underline each of the 
twenty words as you use it. 

The time limit is three minutes. 

The words used in this test were: 


STAIRS OCEAN CHEMISTRY SONG TEST MOUNTAIN BUBBLE DOG 
LEMON PICTURE POST BLANKET VIOLIN LAMP NIGHTMARE 
STEAM LEG WINDOW SWAMP STAMP 

(The words were given in this order.) 


Part 4. Instructions 


The object of this test is to construct out of ten blocks on each trial as many 
pieces of furniture or home furnishings as possible. The pieces of furniture you 
construct must fit properly. It must be symmetrical and be recognizable as a 
piece of furniture. Do not attempt to be futuristic. Use conventional forms. 
You must use a minimum of two blocks to construct a piece of furniture. You 
can make as many of the same type of furniture as you like. You will receive 
full credit for the same type that is only slightly different from another. 

You have only ten minutes to complete this test. There are five trials. 
Hence, you have only two minutes for each trial. 





Originality Ratings of Store Display Personnel 33 


The blocks used in all five trials were geometric shapes selected from a box 
of playing blocks. On each trial the blocks were presented to the subject on a 
piece of cardboard with each shape outlined so that the positions of the blocks 
were standardized. A record was kept of all of the combinations of blocks for 
which credit was given. 

2. Rating Scale. Each subject in the display department was rated on a 
five point scale by the manager and by the assistant manager of that depart- 
ment. These men rated their employees independently; however, it must be 
borne in mind that these two men, over a period of time, must have exchanged 
ideas as to the originality or creative thinking of their employees. 


Subjects 


In the present study the Reorganization Test was given to a total of 
33 employees in the display department of R. H. Macy’s. Their indi- 
vidual positions ranged in talents and included: artists, window, 
show-case and floor display men, designers, stylists, and the executives 
in charge. 


Results 


The test results of these 33 department store employees in the display 
department were compared with those of the three groups, 30 professional 
artists, 25 art majors, and 48 unselected students, reported in the previous 
study. The mean scores and standard deviations for each group on 
each part of the test are shown in Table 1. 


Table 1 


The Mean Performance Scores and the Standard Deviations 
for Each Group on Each Part of the Test 








Professional Art Display 
Artists Majors Personnel Unselected Students 
N = 30 N = 25 N = 33 N = 48 


Mean S.D. Mean §&.D. Mean §.D. Mean §.D. 











17.7 7.2 21.9 6 10.8 , 18.0 4.2 
12.5 1.9 13.2 d 11.9 , 6.7 18 
11.4 4.1 7.3 , 6.8 J 91 3.2 
18.4 7.8 13.9 ; 14.0 a 3.4 2.7 
Total Score 60.5 12.3 56.4 5. 43.1 18.1 37.6 7.0 





It will be seen that the display personnel compare almost equally 
with the unselected student in total score, whereas both are considerably 
lower than the art majors and professional artists. The only sub-test 
in which there is a striking difference of score is in Part 4, which is 
concerned with the construction of furniture with blocks. It is inter- 
esting to note, therefore, that a large part of the department store 








34 C. P. Dougan, E. Schiff and L. Welch 


personnel tested were employees of the furniture and interior decorating 
departments. 

The difference between professional artists, unselected students, and 
art majors has already been mentioned in the previous studies. 

All of the differences between sub-tests were put to test and some 
significant t-values were obtained. The t-values obtained for differences 
between the means of the four groups on each part of the test and be- 
tween the total score are presented in Table 2. 


Table 2 


The t-Values Obtained for Differences Between Means of Groups* 








Prof. Prof.. Art Majors Display Display 
Artists Artists and and Per. and Pers. and 
and Unselected Unselected Prof. Unselected 
Parts Art Majors Students Students Artists Students 





1 2.1 0.2 3.0 4.0 6.0 
2 1.5 13.2 17.1 A 11.5 
3 4.4 1.1 2.4 4.5 2.8 
4 2.0 10.2 7.5 1.9 6.8 
Total 
Score 1.1 10.9 7.2 4.5 1.91 


* tos = 2.0; tor = 2.3; to: = 2.6. 





It appears that, for the total test score, the difference between the 
display personnel and the professional artists is statistically significant, 
while that between the display personnel and the unselected students is 
not. Parts 2 and 4 of the test seem especially important. In all cases 
except between the display personnel and the unselected students, the 
differences between the groups on these two parts seem to be consistently 
significant. 

In order to determine the degree to which the test results agreed with 
the creative ratings given by the department managers, the test scores 
and the ratings were analyzed. A coefficient of .60 was obtained. 

It was considered that perhaps Part 4, alone, would be of high enough 
reliability for judging creative thinking. However, when the results of 
Part 4 were analyzed it was found that the P value for this sub-test when 
used singly was small enough to cast doubt on the hypothesis. 


Summary and Conclusions 


The purpose of this study was to measure originality of department 
store display personnel. A special test was constructed by Welch in 
which the subjects were obliged to recombine familiar ideas according to a 





Originality Ratings of Store Display Personnel 35 


series of four different patterns. The subjects were rated by their 
supervisors by means of a 5 point rating scale to provide performance data 
with which to correlate the Reorganization Test results. 

1. A contingency coefficient of .60 was obtained between the ratings 
given by the manager and the scores resulting from the Reorganization 
Test. 

2. The performance of the display personnel was compared with that 
of subjects examined in a previous study. The mean scores for the four 
groups are as follows: professional artists, 60.5; college art majors, 56.4; 
department store display personnel, 43.1; and unselected students, 37.6. 
The difference between the professional artist and the display personnel 
is statistically significant while that between the unselected student 
and the display personnel is not. However, the display personnel were 
superior to unselected students on two of the sub-tests. 

3. These results indicate that there is a possibility of measuring 
originality, as it would apply in the fields of advertising and display. 


Received July 12, 1948. 


References 


1. Welch, L. Recombination of ideas in creative thinking. J. appl. Psychol., 1946, 
30, 638-643. 

2. Welch, L., and Fisichelli, V.R. The ability of college art majors to recombine ideas 
in creative thinking.’ J. appl. Psychol., 1947, 31, 278-282. 








The Rosenzweig Picture-Frustration Study in the Selection 
of Department Store Section Managers 


H. Wallace Sinaiko 
L. Bamberger & Co., Newark, N. J. 


This report is an outgrowth of a study of certain intelligence and 
personality characteristics of Department Store Section Managers.' A 
battery consisting of two tests of mental ability and one measure of 
personality was administered to a group of 53 of 58 employed Section 
Managers. 

Findings with regard to the two intelligence tests were essentially 
negative: correlations between test scores and a quantitative rating of 
job performance were so low as to be chance deviations from zero. 
Similar treatment of scores from the personality test—the Rosenzweig 
Picture-Frustration Study—produced more fruitful results in terms of 
predicting job performance in Section Managing. 


Method 


The Instrument. The P-F Study consists of 24 cartoon-like pictures 
in booklet form. Each picture illustrates a frustrating situation involving 
two or more people. One figure is shown saying something about the 
situation while the caption box over the second person is blank. The 
subject is told to write the first reply that comes to his mind in the blank 
over the person being addressed in the picture. 

Six principal scores are derived from the P-F Study. Responses are 
categorized according to direction of aggression and type of reaction. 
“Direction” categories include the following: (1) Extrapunitiveness,— 
aggressions directed by the subject toward someone or something in the 
frustrating situation; (2) Intropunitiveness,—aggressions directed by the 


1A definition of the Job “Section Manager” appears in the Dictionary of Occupa- 
tional Titles, Part I, page 576, as follows: “‘Manager, Floor; aisle man; floorman; manager, 
section; (retail trade); 0-75.10; supervises employees in a designated section of the selling 
floor; instructs new workers and sees that they follow store system in making sales; 
shifts selling personnel from one department to another so that service will be efficient 
and prompt; regulates lunch hours and grants permission for employees to leave the 
floor; handles returned goods, approves bank checks, and adjusts claims or refers them 
to the adjustment department; answers customers’ questions relative to merchandise or 
location of merchandise; floor-walker (almost obsolete).’’ 


36 





The Rosenzweig Picture-Frustration Study 37 


subject toward himself; (3) Impunitiveness,—the absence of aggressive 
feeling. “Types” of reaction are: (1) Obstacle-Dominance,—the prob- 
lem, or situation, is predominant in the subject’s response; (2) Ego- 
Defensive,—blame, or responsibility, is assigned for what has happened; 
(3) Need-Persistence,—a solution to the problem is mentioned. The 
scoring categories are symbolized by the letters E, I, and M, each corres- 
ponding to the “direction” of aggression. ‘Type’ of reaction is signified 
by the use of the symbols O-D, E-D, and N-P. 

The Group. As mentioned above, 91% of the 58 Employed Section 
Managers comprised the experimental group. Breakdown of the group 
by sex was: 44 women and 9 men, 83% and 17% respectively. Length 
of time on the job ranged from three months to 16.7 years (median = 18 
months, Q,; = 6.5 months, and Q; = 66 months). Formal education 
ranged from eight years to seventeen years. One-eighth of the group 
had not completed high school, 50% had completed one or more years of 
college, and 15% were college graduates. Age ranged from 21 to 57 years 
(Median = 30, Q: = 24, and Q; = 36). 

The Criterion. A quantitative measure of job performance was built 
with information obtained from Executive Personnel History forms. 
This is a modified linear rating scale used throughout the company in its 
semi-annual personnel review and rating of all executives. Executives, 
or Section Managers in this case, are rated on six basic qualities, each 
being subdivided into from two to ten categories. These qualities include 
Character, Intelligence, Intuition, Experience, Adaptability, and Special 
Skills. Ratings are made on the subdivisions under each main category. 
For example, Intuition is rated for each of two points: ‘Are decisions 
based on limited data usually correct?” and “Are decisions arrived at 
without undue delay?” 

Actual ratings assigned to the subdivisions are confined to the fol- 
lowing values: Outstanding, Above Average, Average Plus, Average, 
Below Average, and Unsatisfactory. 

Each Section Manager is rated by one of four Floor Superintendents. 
All ratings are checked by the Chairman of Personnel Reviews. Thus, 
there is a “common denominator” roughly operating to keep ratings by 
different supervisors comparable. All Section Manager ratings used were 
on a minimum of three months’ service on the job. 

To convert the above descriptive ratings into quantitative terms 
arbitrary weights were assigned as follows: Outstanding, 11; Above 
Average, 9; Average, Plus 7; Average, 5; Below Average, 3; and Unsatis- 
factory, 1. Mean point values were computed for each of the six basic 
qualities weighted and summated. 

A seventh basic category on the Executive Personnel History form, 














38 H. Wallace Sinaiko 


“Placement and Development,” was treated slightly differently. The 
subdivisions, ‘‘Is he well placed on his present job?’’, and “Is he satisfied 
with his present status?”’, were weighted as follows: Yes, 10; Yes, qualified 
5; No, —10; and No, qualified —5. This weighted score was algebraically 
added to the summated averages of the preceding six rated qualities. 
This final figure gave us a quantitative criterion measure for each Section 
Manager. A frequency distribution of ratings for the entire group 
showed a range of approximately 50 points (24.6 to 74), a mean of 58.9, 
and a standard deviation of 9.6. 


Results 


Ratings of each Section Manager’s job performance and the six 
principal scoring categories of the P-F Study were correlated (Pearsonian 
product-moment r). Table 1 summarizes these relationships as well as 
those between P-F Study scores and length of service. 


Table 1 


Pearson Correlations between Rosenzweig Picture-Frustration Study Scores, Length 
of Service, and Job Ratings of 53 Department Store Section Managers 








Picture-Frustration Study Scores 


E I M. O-D E-D 








Length of Service —.23 15 .a9** ll —.12 
Job Ratings —.31* .28* .25 —.02 — .48** 





* Significant at the 5% level. 
** Significant at the 1% level. 


Discussion 


Length of Service. The distribution of this variable had marked 
skewness toward the right, or longer period of time on the job. The 
mean length of time in Section Managing was approximately 34 months 
while the median was only 18 months. Hence, there seems to be a fairly 
high rate of turnover, with only a small group of “long stays” in the job. 

One score on the P-F Study showed a statistically significant relation- 
ship to length of service: Impunitiveness. Thus, there is a tendency 
among the longer-staying Section Managers to show more M (mini- 
mizing, absence of blame-placing, conformity) in their responses than _in 
the more recently hired of the group.? 

* Correlations were run between age and P-F Study scores. One significant rela- 


tionship, r = —.28, +.05, was found between Extrapunitiveness and this variable. 
All other correlations between age and P-F Scores approximated zero. 





The Rosenzweig Picture-Frustration Study 39 


Job Ratings. There were four statistically significant correlations 
between P-F Study scores and job ratings. Keeping in mind the re- 
quirements of a Section Manager’s duties, these relationships follow a 
logical pattern. There was a negative correlation between the criterion 
and E:r = —.31,+ .05. Better Section Managers show relatively fewer 
extrapunitive, aggressive, responses. Management requires of its Section 
Managers a constant display of good-will in their customer contacts. 
A large number of these contacts occur under strained circumstances 
produced by such things as complaints about quality of merchandise, 
non-delivery, or service, etc. Section Managers must obviously refrain 
from any show of aggressiveness in handling these adjustments if they 
are to maintain customer friendship toward the store. 

The correlation between the criterion and I, r = .28, +.05, indicates 
that better Section Managers show a tendency to turn their aggressions 
against themselves. Intropunitiveness may be a necessary adjunct of 
efficient Section Managing. The somewhat hackneyed phrase, ‘The 
customer is always right,” is an attitude actually encouraged by Manage- 
ment. In other words, store policy regarding customer relations is itself 
an intropunitive one. 

M scores were correlated with the criterion to a positive, but statis- 
tically insignificant, degree: r =.25, +.10. There is a slight tendency 
for better Section Managers to avoid defining situations as conflictual 
and to see them as non-frustrating. 

The first of the P-F Study scores relating to type of reaction, O-D, 
showed a practically zero correlation with the criterion: r = .02, + .50. 

The highest correlation between P-F Study scores and job ratings 
was found between E-D and the criterion: r= — .48, + .01. Thus, 
low-rated Section Managers tended to be more defensive when con- 
fronted by the test situations; i.e. they were overly concerned with fixing 
responsibility, either in assuming blame themselves or in blaming some- 
one else. 

N-P scores on the P-F Study showed a moderate relationship with 
the criterion: r = .38, + .01. High-rated Section Managers tend to 
have an adaptive, or solution-seeking, attitude for dealing with every-day 
problem situations. 

Additional Statistical Data. Correlations were run between ratings 
and four variables, length of service, age, education, and sex. This was 
done to determine whether any of the reported relationships might be 
an artifact of one of these variables. Correlations were as follows: 
(1) between age and ratings: r = .19, + .15; (2) between length of 
service and ratings: r = .34 + .02; (3) between education and ratings: 
r = .25 + .06; (4) between sex and ratings: r = .27 + .04. Thus, the 











40 H. Wallace Sinaiko 


latter three variables, length of service, education, and sex, are related 
to job ratings to a statistically significant degree. Women tended to get 
higher ratings than men. 

P-F Study scores of the 15 highest-rated Section Managers are com- 
pared with scores of a like number of the lowest-rated in Table 2. The 
comparison of mean P-F Study scores of top-rated and bottom-rated 
Section Managers, shown in Table 2, confirms the earlier discussed cor- 
relational findings. However, the statistical significance of these differ- 
ences is greatly reduced by the small number of cases. In any event, 
differences do exist in the divection indicated by the overall correlations 
on the total group of 53 Section Managers. 


Table 2 


Comparison of P-F Study Mean Scores* of the 15 Highest-Rated Section 
Managers and the 15 Lowest-Rated Section Managers 











Scoring Categories 





M O-D 





Highest-Rated 
Mean ‘ 30.6 18.1 33.6 
Sigma 7.5 11.8 7.2 8.2 10.9 


Lowest-Rated 
Mean 29.6 27.0 17.4 54.6 28.1 
Sigma 14.7 7.6 9.6 5.9 9.7 12.0 
t 1.15 .90 .26 1.83 1.28 
p 28 24 36 50 06 20 





* Values for each category represent the proportion of the total number of responses 
made in the test falling in that category. 


Table 3 compares quartiles of P-F Study scores of the 15 highest- 
rated and 15 lowest-rated Section Managers. That there is a great deal 
of overlap between the top-rated and bottom-rated Section Managers’ 
scores is apparent. Thus, the P-F Study is not a highly valid selection 
device by any means, although tendencies do seem to be indicated 
insofar as performance in Section Managing is concerned. 

A further check on the efficiency of the P-F Study with the present 
occupational group was made by using a “‘combined P-F index.’’* The 
index was built by adding the number of I, M, and N-P responses made 
by each Section Manager, and then subtracting the number of E and 
E-D responses. In this way a simple algebraic expression, which could 


* This index was suggested to the writer by Dr. H. G. Gough, Department of Psy- 
chology, University of Minnesota, in a personal communication. 





The Rosenzweig Picture-Frustration Study 


Table 3 


Comparison of P-F Study Quartiles* for 15 High-rated 
and 15 Low-rated Section Managers 








Scoring Categories 





E I M 0-D E-D N-P 
Q: Md Q; Qi Md Qs Q: Md Qs Qi Md Q: Qi Md Q;s Qi: Md Q; 


High-rated 21 34 48 23 34 44 17 32 40 12 19 23 41 50 54 22 33 44 
Group 


Low-rated 32 43 52 24 29 35 19 26 35 10 18 23 52 54 66 23 26 35 
Group 











* Values for each category represent the proportion of the total number of responses 
made in the test falling in that category. 


be either positive or negative, was derived for each of the top-rated 15 
Section Managers and each of the bottom-rated 15. A comparison of 
indexes thus obtained on each of the two groups of Section Managers 
is shown in Table 4. 

If a cutting score were to be established at plus 2 we would eliminate 
5 of the top-rated 15 Section Managers and 11 of the bottom-rated 15. 


Table 4 


Comparison of Combined P-F Study Indexes of 15 Highest- 
Rated and 15 Lowest-Rated Section Managers 








Indexes 





High-Rated 





16.0 
15.5 
12.5 
12.0 
12.0 
9.0 
6.0 
4.5 
4.0 
2.5 
—4.0 
—7.5 
—10.0 
—13.5 
—18.5 

















42 H. Wallace Sinatko 


The use of a simple index, such as that described here, corroborates the 
discussion of Table 3. Thus, the P-F Study is far from being a highly 
valid selection tool although it does warrant some consideration in the 
hiring of Department Store Section Managers. 


Summary 


1. The Rosenzweig Picture-Frustration Study was administered to 
53 Department Store Section Managers. Quantitative measures of job 
efficiency were built from personnel review data and correlated with 
each of the six principal scores derived from the P-F Study. 

2. Statistically significant negative relationships occurred between the 
criterion and scores for Extrapunitiveness, and between the criterion and 
Ego-Defensive scores. Positive, statistically significant relationships be- 
tween the criterion and Intropunitiveness, and between the criterion 
and Need-Persistive scores were found. A positive, but not significant, 
correlation was found between the criterion and Impunitiveness. A near- 
zero relationship existed between job ratings and Obstacle-Dominance 
scores on the P-F Study. 

3. A simple technique of combining P-F scores into an index would 
admit 10 out of 15 top-rated Section Managers and would reject 11 out 


of 15 bottom-rated Section Managers if a cutting score of plus 2 was used. 

4. This investigation suggests that the Rosenzweig Picture-Frustration 
Study measures factors which are associated with occupational success 
as a Section Manager, and which might have value in an employment 
selection program. 


Received July 6, 1948. 











The Rorschach as a Predictor of Academic Success 


Boyd Rowden McCandless 
Ohio State University 


Many studies have been made, and many claims advanced for the 
Rorschach as a highly useful test in the area of academic prediction. 
The thinking behind the studies is perhaps best summarized in Klopfer 
and Kelly (3, p. 266): 


“Tf the Rorschach method could do nothing else but estimate the intellectual 
level of the subject as well as the usual intelligence tests, these tests would be 
preferable since they are simpler to apply. The importance of the Rorschach 
method for the intellectual aspect of personality diagnosis lies in something 
which no intelligence test attempts, the differentiation between potential 
capacity and actual efficiency.” 


Beck (1), Rappaport by implication (5) and Munroe (4) in a careful 
experimental study of college women concur in such an estimate of the 
Rorschach test as a measure of intelligence and a predictor of success. 

Munroe (4) has worked out a 28 item check list, usable with either 
the group or individually administered Rorschach. It is filled in by a 
protocol inspection method, and general adjustment has been found by 
her, working with women students at Sarah Lawrence, to correlate 
negatively with number of checks accumulated by thesubject. In general, 
girls with fewer than 10 checks were reasonably adequately adjusted; girls 
with more than 10, moderately to seriously maladjusted (4, p. 66). The 
“Inspection Rorschach” adjustment rating predicted academic success 
somewhat better than did ACE percentile ratings, coefficients of con- 
tingency .43 and .36 for 348 subjects; corrected, .49 and .39 for the two 
tests respectively, (4, p. 76). 

Beck (1) has devised an organization, or Z score, to be derived, essen- 
tially, from individually administered Rorschach tests. To quote: 


. . . the sum of all the Z scores in any Rorschach record is the measure of 
S’s organization activity. These totals vary directly as the intelligence of S. 
The Z factor has certain virtues not inherent in W. For one thing, it takes 
account of much activity that W misses. Second, since it is not scored in 
discrete units, as is necessary in the case of W, it makes it possible to take 
account of intermediate values and continuous distributions, and is thus a more 
flexible measure. Third, it is an index of the intellectual energy as such, ir- 
respective of the kind of intelligence that S uses, something that does influence 
W. Thus Z is a more accurate representative of the intelligence functioning 


per se. ... it is therefore an index of thinking power. Its essence is the 
capacity to grasp relations not perceived by others (1, p. 12). 
43 























44 Boyd Rowden McCandless 


The three authors, Beck (1), Klopfer and Kelly (3) and Rappaport 
(5), less directly than the first two, assign predictive values in a general 
fashion to many categories of the Rorschach. Munroe (4), as stated, 
has done so specifically and empirically. 

Of the score for number of whole responses to blots produced, Beck 
says: “The higher the intelligence potential of an individual, the more 
W he can produce” (1, p. 10). Klopfer and Kelly state: “. . . (W) 
represents an emphasis on the abstract forms of thinking and the higher 
forms of mental activity” (3, p. 259). Both qualify the quantitative 
use of this W score, stating that the quality of W must be considered. 

Of large detail (D) and small detailed responses to the individual 
cards, Beck states that where emphasis is on D there is revealed “‘a person 
who attends to obvious and practical interests’ where Dd shows an 
“evidence of some need to pursue too much the elements that most people 
disregard”; and emphasis on W “is the sign of an over-all thinker’ 
(1, p. 14). Klopfer and Kelly believe that the individual with approxi- 
mately 24 of his responses listed as D and Dd “has enough common sense 
to use the most obvious material before he starts seeking the unusual’’ 
(3, p. 260). 

Summarizing the thinking of the various authors on other categories, 
with perhaps some injustice, it appears to be, from the point of view of 
making predictions of efficiency: 

Animal (A) responses tend to indicate a certain amount of conformity 
of thought, too few indicating unusual thought processes, too many 
barrenness, lack of creativity and stereotypy. 

Popular responses have roughly the same meaning. 

Percentage of responses made on the basis of form alone indicate in a 
general way an intellectual, unemotional approach to life; the percentage 
of form responses at a superior level is directly related to functioning 
intelligence. 

Human movement responses betoken creative imaginativeness, with 
qualifications set on their location and type. 

Responses dominated by color, but with form present (CF), betoken 
an adjustment intermediate between infantile and fully, socially adult, 
as far as emotional control is concerned. Responses dominated by form 
but using color are given by the emotionally fairly rich but controlled, 
mature person. 

Vista or perspective responses (FV or FK) are used by persons who 
are self-critical and liable to “inferiority feelings.” Flat grays (FY or 
FC’) used in responses to the cards indicate anxiety and reduction of 
intellectual energy. 














The Rorschach as Predictor of Academic Success 45 


The broader the subject’s interests and the richer his educational 
back-ground and the higher his intelligence, the greater the variety of 
things he will see. 

In general, the more intelligent and the less anxious the subject is, 
the more complete human figures he will see; and the more whole human 
and animal with relation to detailed human and animal figures there 
will be in his record. Finally, the more intelligent he is, the larger the 
total number of responses he will give. Klopfer and Kelly (3, p. 208), 
however, do not agree fully with this. The seeing of things in the white 
spaces tend to betoken resistive, persistive and unusual methods of 
approach. 

It is with these elements of the Rorschach that the author has con- 
cerned himself in this study. He realizes most clearly that the Rorschach 
is essentially a configurative test, where a pattern of factors must be 
taken account of to make any really adequate interpretation or prediction. 
On the other hand, he feels that the more checks made on the predictive 
efficiency of the specific categories of the Rorschach for which predictive 
efficiency has been claimed, the more valid is the use of the test for such 
purposes of prediction. In the case of this study, this prediction is in 
the areas of academic progress and achievement. 


Subjects and Method 


Individual Rorschach’s were given in conjunction with vocational 
guidance, during the writer’s assignment as Selection and Classification 
Officer to the U. S. Maritime Service Officers School, Alameda, California 
to approximately two hundred Officer Candidates. These men were 
aspiring for marine licenses and commissions in the U. 8. Maritime 
Service, and undergoing a four months’ period of training pursuant to 
that end. Every subject who could be matched on the eight criteria 
used was selected for this study.' 

These men were “normal” in that they were functioning adequately 
in a wartime society, contributing to the war effort, making in general 
adequately and highly motivated progress toward their specific goal, and 
were in no case undergoing psychiatric treatment. 

Thirteen pairs of men were matched on the basis of AGCT score; 
average Mechanical Comprehension Test score; average Iowa Silent 
Reading comprehension test score (form Am, new, advanced); average 
Stanford Advanced Arithmetic Reasoning Test score; average age and 
amount of education; marital status (six married, two divorced, five 
single in each group); and enrollment in division of the school (ten 


1 This study reflects the author’s conclusions and is not an official Maritime Service 
publication. 














46 Boyd Rowden McCandless 


members of each group were enrolled in Deck training, three in Engine 
training). 

The basis of differentiation was in terms of the academic grade 
averages. With a value of 5.0 assigned to grade A and 1.0 assigned to 
grade F, the high grade point group averaged 4.7 ranging from 4.5 to 
5.0; and the low grade point group struggled through the school with 
average grades of 2.9, ranging from 1.0 to 3.6. Some, indeed, of the low 
grade point group failed to qualify academically for their licenses and 
commissions. 


Table 1 


Quantitative Characteristics and Differences between 
High and Low Grade Point Groups 








High Grade Low Grade Significant 
Characteristic Point Point Diff. tof Diff. at % level 





AGCT 135.7 135.7 . 0.000 Greater than 5 
MCT 141.5 139.4 ; 0.430 Greater than 5 
Arithmetic Equated Score 94.1 89.5 : 1.367 Greater than 5 
Reading Standard Score 103.9 99.4 " 1.213 Greater than 5 
Age (years) 25.6 24.4 ; 0.679 Greater than 5 
Education (years) 12.3 12.1 ; 0.605 Greater than 5 





It will be noted from Table 1 that these men are very superior, 
psychometrically speaking. The mean AGCT score for the rank and 
file was set at 100, with a S. D. of 20 points. The average for this group 
was 1.75 S. D. above the national mean. 

The groups average 2 S. D. above the mean on mechanical compre- 
hension, as measured; and in math and reading comprehension, approach 
the average third year college man. 

The ranges for the equating scores are given in the following tabu- 
lation: 

Variable High Grade Point Low Grade Point 
AGCT 124-149 124-150 
MCT 123-161 119-161 
Math 67-103 65-105 
Reading 91-113 93-120 
Age 19-37 19-32 
Education 10-15 10-14 


Between these two groups of men, so similar in quantitative character- 
istics, so different in academic success, the problem was to distinguish, 
if possible, personality characteristics which might explain the efficiency 
differences. 

Their Rorschach’s were studied intensively in an effort to find such 
distinguishing characteristics. 





The Rorschach as Predictor of Academic Success 


Results 


The results of the present study were negative, with one exception. 
Table 2 summarizes the averages for the high and the low grade point 
groups. Differences in the direction of the low grade point group are 
indicated by a minus (—) sign; t (Edwards (2)) is given in the fourth 
column and the level of significance of the t in the fifth column. 

It will be noted that t approaches the one per cent level of confidence 
in only one case,—mean number of popular responses. Even here, the 
difference is of little practical significance (8.1 versus 6.6 mean popular 
responses for the respective groups). 

Favoring the high grade point group with t’s above 1.0, are found for: 
Mean number large detail responses; Mean number tiny detail responses; 
Mean number space responses; Mean number human movement re- 
sponses; Mean number pure form responses; Mean number superior pure 
form responses; Mean number animal responses; Mean number human 
detail plus animal detail responses; Mean number popular responses; and 
Mean quality of whole responses. 

Favoring the low grade point group with t’s above 1.0, are found for: 
Mean number whole responses; Mean number achromatic color re- 
sponses (both including and excluding texture); and Mean number of 
color-form responses. 

It has been considered by most of the authors working with the 
Rorschach that the ratio of W (whole blot) responses to the number of 
M (human movement) responses is one of the best of the predictive 
factors for “efficiency” or productivity, with a ratio of 3 to 1 being con- 
sidered optimal. Ratios falling materially below 3.0 are considered to 
characterize “underproductive” persons; ratios falling materially above 
3.0 are considered to characterize ‘‘over-striving” persons, whose perform- 
ance is likely to be describable as “quantity” rather than ‘‘quality.” 
These latter may produce much, be over-ambitious, under considerable 
strain; and their products are likely to be superficially acceptable rather 
than really good. 

If such considerations hold for these two groups, we should expect 
the high grade point men to have a mean ratio approximating 3.0, which, 
if deviant, would probably be expected to be above 3.0; the low grade 
point group would be predicted to show a mean ratio falling below 3.0. 
As can be seen from Table 2, the opposite is true, the high grade point 
men showing a mean ratio of 1.6; the low grade point men a ratio of 3.4. 
The difference, however, is not a statistically significant one. 

Beck’s Z or organization score (a measure of the “‘capacity to grasp 
relations not perceived by others’ (1, p. 12)) differentiated even less 
effectively than the conventional Rorschach categories discussed above. 





Boyd Rowden McCandless 


Table 2 


Selected Rorschach Differences and their Significance 
for High and Low Grade Point Groups 








High Low 
Grade Grade } % Level of 
Mn. for Category Point Point Diff. t for Diff. Confidence 


N Responses 39.4 32.6 6.8 0.916 Greater than 5 
N Whole R’s' 6.4 10.0 —3.6 1.532 Greater than 5 
N Detail R’s! 26.1 19.8 6.3 1.284 Greater than 5 
N Tiny Det. R’s' 6.1 2.7 3.4 1.183 Greater than 5 
N Main and Additional 

Space R’s! 11.3 8.4 2.9 1.029 Greater than 5 
N Human Mov’t R’s 5.8 3.5 2.3 1.257 Greater than 5 
N Animal Mov’t R’s? 4.2 3.2 1.0 0.957 Greater than 5 
N Inanimate Nov’t R’s? 1.8 1.7 0.1 0.121 Greater than 5 
N Vista R’s 1.9 1.8 0.1 0.146 Greater than 5 
N Form R’s 18.8 13.1 5.7 1.528 Greater than 5 
N Superior Pure Form 

R’s! 13.8 10.3 3.5 1.791 Greater than 5 
N Superior of Total 

R’s in form! 30.4 25.4 5.0 0.935 Greater than 5 
N Achromatic Color R’s? 2.2 4.8 1.525 Greater than 5 
N Achromatic Color R’s! 3.2 1.068 Greater than 5 
N Form-texture R’s? 3.0 2.3 0.7 0.321 Greater than 5 
N Form-color R’s‘ 3.2 3.4 0.119 Greater than 5 
N Color-form R’s : 2.5 1.308 Greater than 5 
Mn. Sum Color? J 4.5 0.764 Greater than 5 
N Human R’s ‘ 2.6 ’ 0.956 Greater than 5 
N Animal R’s : ‘ 1.555 Greater than 5 
N Human Detail + 

Animal Detail R’s 4.0 : 1.789 Greater than 5 
N Anatomy R’s : 2.9 , 0.610 Greater than 5 
N Popular R’s! ; 6.6 ‘ 2.836 Less than 5 (1) 
N Response Categories ; 0.741 Greater than 5 
Z Score! ; 48.5 : 0.409 Greater than 5 
N Checks Munroe , 12.1 h 0.359 Greater than 5 
N Superior Wholes* i 2.4 0.555 Greater than 5 
Mn. Whole Quality? } 1.7 } 1.863 Greater than 5 
N Whole::N Human 6 3.4 0.691 Greater than 5 

Mov’t Ratio® 
Human + Animal R’s:: 

Human Detail + An- 

imal Detail Ratio 2.7 5.8 Greater than 5 
Human Mov’t::Sum 

Color Ratio® 2.0 1.1 § Greater than 5 











1 Atter Beck’s (1) criteria. 

2 After Klopfer’s and Kelly’s (3) criteria. 

8 After Rappaport’s (5) criteria. 

4 There were too few pure color, or texture-form or pure texture responses to compute 
a legitimate difference. 

5 Based on 11 pairs, due to zero in numerator of 2 ratios. t was computed on these 
ratios as with N’s, since the various Rorschach authors seem to regard the relationship as 
a unit or entity. 





The Rorschach as Predictor of Academic Success 49 


What slight, statistically non-significant discrimination it did make was 
in the wrong direction; mean Z score for the high grade point men was 
43.3, with a range from 9 to 99.5; for the low grade point men, mean Z 
score was 48.5 range 6 to 122.5. +t was .409 for this difference. 

As a final check, Munroe’s (4) check sheet, which gave positive results 
for the Sarah Lawrence students, was filled out for each man. Here 
the small difference was shown in the right direction (high grade point 
men averaged 11.4 checks; low grade point men 12.1 checks). The range 
was wider, however, for the former group (4-28) than for the latter 
(4-20). The men would appear to be seriously maladjusted, also, 
according to Munroe’s findings, who considers ten checks as a cutting 
score (4, p. 66). Her students were not given the individual Rorschach, 
which may account for the greater number of checks earned by this group. 


Discussion 


Despite the consistently negative results of this investigation, certain 
trends appear according to prediction. In general, the high grade point 
men are seen to be slightly more controlled emotionally or with less 
emotion to control, slightly more productive; on most criteria, slightly 
less anxious. They tend to show up with higher averages, even when the 
factor of their higher productivity is cancelled out, in the scores which 
indicate conformity (except for space responses), and appear, although 
not significantly, better able to attend to the large, usual; and the tiny, 
unusual details of the Rorschach blots. If one can generalize from such 
a tendency, it might be said that such a solid, conforming, non-theoretical 
approach is one of the bases for academic success, particularly in a “‘cram”’ 
type of program such as the Officer Candidate programs tended to be. 
The only significant difference (more popular responses for the successful 
students) fits this trend. 

The author does not feel that the findings of this paper detract from 
the clinical use of the test; but he believes it essential that many such 
checks as this be made. Finally, he grants the extreme difficulty of the 
task to which the Rorschach has been set in this case (restricted range 
and high level of ability, possible similarity of personality due to choice 
of occupation, small number of cases, etc.). Many authors, however, 
appear to have taken it for granted that the task could easily be accom- 
plished. 

Finally, other patterns, or combinations of the factors discussed above, 
or some total scoring, weighting system other than Munroe’s could con- 
ceivably be found to make a clear differentiation between these groups 
of men who differed so significantly in performance in the highly moti- 











50 Boyd Rowden McCandless 


vated Officer Candidate situation. The author’s repeated scrutiny of 
the tests has failed, however, to reveal such patterns. 


Summary 


Two matched groups of Officer Candidates, U. S. Maritime Service, 
who differed widely in academic achievement in a highly motivated, 
wartime, officer training program, were given individual Rorschach’s 
with the following results: 

1. An analysis of the conventional Rorschach categories failed to 
demonstrate any important statistically significant differences, although 
trends appeared. 

2. Munroe’s (4) check list which discriminated good from poor stu- 
dents at Sarah Lawrence college failed to show differences in this group. 

3. Beck’s (1) Z or organization score also failed to make discrimina- 
tions. In fact the latter showed slight mean differences in a direction 
opposite to expectations. The statistically non-significant, but consistent 
trends were toward more emotional control, more conformingness, less 
anxiety on most criteria, more attention to concrete details, and slightly 
greater productivity for the high grade point men. 


Received July 12, 1948. 


References 


. Beck, S. J. Rorschach’s Test, II. New York: Grune and Stratton, 1945, Pp. xii 
+ 402. 

. Edwards, A. L. Statistical analysis. New York: Rinehart and Company Inc., 
1946, Pp. xviii + 360. 

. Klopfer, B., and Kelly, D. McG. The Rorschach technique. New York: World 
Book Company, 1942. Pp. x + 436. 

. Munroe, R. L. Predictions of the adjustment and academic performance of college 
students by a modification of the Rorschach method. Stanford University: Stanford 
University Press, 1945. No. 7 of the Applied Psychological Monograph. Pp. 
104. 

. Rappaport, D., Gill, M., and Schafer, R. Diagnostic psychological testing. Chicago: 
Year Book Publishers, Inc., 1946. Pp. xi + 516 (Vol. II). 





The OL Key of the Strong Vocational Interest Blank for Men 
and Scholastic Success at College Freshmen Level * 


Stanley R. Ostrom 
Department of Public Instruction, Dover, Delaware 


Psychologists have developed instruments that measure abilities and 
aptitudes with a fair degree of accuracy. The use of these instruments 
for prediction purposes in learning situations has not proved as successful 
as one might hope, however. This may be due, to some degree, to non- 
intellectual traits which cause some individuals to persevere through 
discouragements while others of apparently equal potential fail. The 
measurement of these traits has proved most elusive. 

Counselors using the Strong Vocational Interest Blank for Men have 
to a large degree assumed that the Occupational Level key of the Strong 
blank is one approach to this problem. This position is verbalized by 
Darley (2, pp. 66): 


Clinical experience together with limited experimental data would indicate 
that the lowest occupational level scores on the revised blank will accompany 
the interest type previously defined as “lower level jobs.’”’ Furthermore, an 
excessively low occupational level score seems at present to be associated with 
lack of “staying power” or “survival power” in college competition. This 
hypothesis should be tested as quickly as research data accumulate, by careful 
studies of matched groups, since it is a phase of the “level of aspiration” and 
general motivational problem. 


Strong holds the same position stating ‘‘Men with high OL scores have 
the interests of business executives and professional men, but those with 
low scores have the interests of workmen” (5, pp. 195). He further 
suggests that the key has value for a counselor helping a student plan 
his high school or college training program (5, pp. 203-204). 

Specific statistical studies for the corroboration of these hypotheses 
are, however, very meager. Berdie (1) reports a correlation of only .03 
between the OL key and academic achievement of forty-three college 
students. He also found an equally low correlation, .01, when he com- 
pared the OL scores with curricular satisfaction. 


* The author wishes to acknowledge the aid and advice of Dr. Milton E. Hahn in 
planning the study on which this article is based. Special credit should be given Dr. 
William Kendall, Dr. Maurice Troyer, Dr. C. Robert Pace and Dr. Eric Gardner for 
their help in executing and interpreting the results of the research. The author’s 
Doctor’s thesis, from which the study is taken, is on file at Syracuse University. 


51 














52 Stanley R. Ostrom 


Kendall (3), on the other hand, obtained positive results when he 
studied 300 male college freshmen in Syracuse University. He found 
that when academic ability as measured by the Ohio State Psychological 
Examination, Form 21, was held constant three groups distinguished by 
differing levels of OL were found to differ in college achievement. His 
three groups consisted of 100 men each of high, average, and low OL. 
The difference between these groups when adjusted for ability by co- 
variance proved significant beyond the five per cent level but not to the 
one per cent level of confidence. Kendall concluded “if used with 
caution OL scores at the extremes of the distirbution should be helpful 
to the counselor in making judgments concerning individual chances for 
scholastic success.”’ 

These studies give impetus to the need for further research as sug- 
gested in the last sentence of the statement by Darley referred to above. 

To test further the above hypothesis the writer conducted a study 
in which an attempt was made to determine the relationship between the 
OL key of the Strong Blank and scholastic achievement at three levels of 
education. The following discussion is a report of the findings at the 
college freshman level. 

As is the case each year, the 1946-1947 freshman class at Syracuse 
University participated in a testing program shortly after enrolling in 


school. Among other tests taken by the men were the Ohio State 
Psychological Test, Form 21 and the Strong Vocational Interest Blank 
for Men. From these test data six groups of seventy-five men each were 
chosen according to the following criteria: 


High level, high ability: Men whose OL scores were equal to a standard 
score of fifty-seven or above and whose raw scores on the Ohio State Psychologi- 
cal Examination, Form 21 were ninety and above. 

Average level, high ability: Men whose OL scores were between standard 
scores of forty-seven and fifty-two, and whose raw scores on the Ohio State 
Psychological Examination, Form 21 were ninety and above. 

Low level, high ability: Men whose OL scores were equal to a standard score 
of forty-five and below, and whose raw scores on the Ohio State Psychological 
Examination, Form 21 were ninety and above. 

High level, low ability: Men whose OL scores were equal to a ‘standard 
score of fifty-seven or above, and whose raw scores on the Ohio State Psy- 
chological Examination, Form 21 were below ninety. 

Average level, low ability: Men whose OL scores were equal to a standard 
score of between forty-seven and fifty-two, and whose raw scores on the Ohio 
State Psychological Examination, Form 21 were below ninety. 

Low level, low ability: Men whose OL scores were equal to a standard score 
of forty-five and below, and whose raw scores on the Ohio State Psychological 
Examination, Form 21 were below ninety. 


Findings 


The mean honor point ratios were determined for each of the six 
groups. From Table 1, it can be seen that an even step progression from 





OL Key of Strong Vocational Interest Blank 53 


low to high OL and from low to high ability emerged except in one 
instance, that of average to high OL in the low academic group. 


Table 1 


Average Honor Point Ratios for Six Groups of Syracuse 
University Male Freshmen (Total = 450) 








Mean Honor Point Ratios 
High OL Average OL Low OL 








High Ohio 1.742 1.569 1.357 ° 
Low Ohio 1.058 1.194 1.036 





These data were then subjected to analysis of variance. Table 2 
shows F-ratios for both OL and academic aptitude at magnitudes great 
enough to justify the rejection of the Null Hypothesis at the one per cent 
level of confidence. 


Table 2 


Analysis of Variance: Multiple Classification for 450 
Syracuse University Male Freshmen 
(Determining Effects of Ability and Level) 








Source Degree Sum 
of of of Mean Test of 
Variance Freedom Squares Squares F * Hypothesis** 


Ability 1 238,496 238,496 72.23 Reject * 
Level 2 37,971 18,985 5.75 Reject 
Interaction 2 28,323 14,162 4.27 tee 
Residual 444 1,467,232 3,302 tee 








Total 449 1,772,022 





* Where F = greater mean square/lesser mean square. By referring to Snedecor’s 
tables of F (4, pp. 222-225), we may use the following three rules in testing the hypothe- 
sis: (a) reject the hypothesis tested, if the calculated value of F is greater than the 1% 
point given in the tables; (b) accept the hypothesis tested, if the calculated value of F is 
less than the 5% point given in the tables; (c) remain in doubt, if the calculated value of 
F lies between the 5% and 1% points given in the tables. 

** The Hypothesis tested is a null hypothesis concerning the difference between 
means of groups, i.e., there is no significant difference between the means of groups. 
(The 1% point necessary for rejection of the Null Hypothesis was 6.70 for ability and 
4.66 for level.) 


Conclusions and Recommendations 


1. A very significant relationship was established between honor point 
ratio and both academic aptitude and OL in the Syracuse University 











54 Stanley R. Ostrom 


freshmen sample. This result strengthens Kendall’s study and gives a 
strong case to the use of OL scores in prediction of college success. It 
does not, of course, justify the use of the key as a single measure of motiva- 
tion, but it does point up its rightful place in a predictive battery. 

2. Standardization of OL on a school population. The Occupational 
Level Key of the Strong Blank was standardized by contrasting “un- 
skilled men” and “business and professional men earning $2,500 and up- 
wards a year” (5, pp. 185). An obvious result of using such a scale on a 
college population is the large number of high OL scores among college 
students. Finding men from the freshman class for the two low OL 
groups was extremely difficult. So difficult, in fact, that it was necessary 
to include men with scaled scores of forty-five to assure groups of seventy- 
five. Setting up an OL key standardized on college groups would un- 
doubtedly result in a sharper instrument. 

3. Follow-up study of college freshmen group. Repeating the college 
freshmen study four years after the original study will be revealing if the 
four year college honor point ratios are available for each group. 

4. Study of the high OL-low ability college freshmen. No reason is 
available to explain the sharp drop in mean honor point ratio between 
the average OL and high OL groups of low ability. An intensive study 
of a generous portion of this group to find answers for this deviation from 
the expected pattern is recommended. 


Received July 21, 1948. 


References 


. Berdie, R. F. Prediction of college satisfaction and achievement. J. appl. Psychol., 
1944, 28, 239-245. 

. Darley, J. G. Clinical aspects and interpretation of the Strong Vocational Interest 
Blank. New York: The Psychological Corporation. 1941. 

. Kendall, W. E. The occupational level scale of the Strong Vocational Interest 
Blank for men. J. appl. Psychol., 1947, 31, 283-287. 

. Snedecor, G. W. Statistical methods. Ames, Iowa: Collegiate Press Inc., 1946. 

. Strong, E. K. Vocational interests of men and women. Stanford, California: Stan- 
ford University Press, 1943. 





Note On the Shifts of Interest with Age 
E. L. Thorndike 


Professor Emeritus, Columbia University 


Thirty-seven men, all graduate students of education, ranging in age 
from 23 to over 40, reported, as well as they could estimate, the relative 
strength of the following tendencies, each for himself at the present time, 
and for himself at the age of 12: Approval (having people look up to 
you); Mastery (being boss); Kindliness (seeing people happy); Gregari- 
ousness! (being with one’s own crowd); Studying things; Studying people; 
and Studying abstractions. 

These men had been studying educational psychology and had a 
certain common basis for their definitions of the above. Doubtless, 
however, the terms did not mean quite the same things to the different 
individuals, and it would probably be impossible to define with precision 
just what they did mean to the average of the group. Within limits, 
however, these terms do have a community of meaning to them and to 
the readers of this note. The change from 12 to adult age (around 30 
in the case of this group) was: a loss of 214 steps for Approval; a loss of 
11% steps for Mastery; a gain of 4 step for Kindliness; a loss of 2 steps 
for Gregariousness; a loss of 114 steps for Studying things; a gain of 3% 
steps for Studying people; and a gain of 214 steps for Studying abstrac- 
tions. For a group of lawyers, or doctors, or engineers, or business men, 
the shifts with age might well be different. 

These facts seem worth noting, especially the different effect of age 
upon the interest in studying things as compared with studying people 
and abstractions, and the absence of any substantial change in kindliness. 
According to traditional fiction, a boy of twelve is brutal and careless of 
others. 

These same records can be studied from the point of view of the per- 
manence of the tendencies as reported. Assuming the validity of the 
testimony, the facts show that a person’s nature at 12 is prophetic of 
his nature in adult years in this respect (the median correlation for the 
37 cases is +.55). The child to whom approval is more cherished than 
mastery is likely to become a man who seeks applause rather than power, 
and similarly throughout. The effect of chance errors, forgetfulness, 
and the like, is to make this correlation toolow. The effect of a constant 
error whereby a person projected his opinion of himself to form his 
opinion of his own past would be to make the relation closer than it 
really was. The net result of eliminating these errors would, I con- 
jecture, be to raise the correlations somewhat. 

Received June 14, 1948. 

1This perhaps would be more suitably named “a mixture of gregariousness and 
sociability.” 

55 














A Fallacy in the Use of Median Scale 
Values in Employee Check Lists 


Clifford E. Jurgensen 
Minneapolis Gas Light Company 


Several investigators (1, 2, 4) have published articles using the 
Thurstone equal-appearing intervals method, or a slightly modified form 
of the method, to select and weight items in a check list to be used for 
rating employees. The author has developed similar unpublished check 
lists and is familiar with a number of other unpublished scales developed 
for or by various companies. It thus appears that the procedure is 
sufficiently used to warrant mentioning a fallacy which appears when 
the equal-appearing interval method is used in an industrial merit 
rating scale. 

Briefly, the method consists of obtaining a large number of state- 
ments which relate to good or poor job performance. Statements are 
printed separately on cards which are then sorted by a large number of 
judges according to the method of equal-appearing intervals. In some 
cases statements are printed serially and judged by encircling a number 
from 1 to 9 preceding each statement, this procedure having been shown 
(5) to give the same results as sorting. The median and semi-inter- 
quartile range for each statement is computed by formula or by nomo- 
graph (3). Statements with a large semi-interquartile range are elimi- 
nated, and the remaining items form a pool from which scale items are 
selected in such manner that statements differ in scale value by approxi- 
mately equal differences. A tentatively selected scale is used experi- 
mentally, tests of item relevancy are made, and the scale is modified 
where necessary. The final scale is used by asking raters to check items 
which describe or apply to the employee being rated. The “‘score’’ is 
the median or mean scale value of the checked statements. 

The scaled statement technique assumes that all items form a single 
continuum which is factorially pure. This assumption has not even been 
loosely approximated in any employee merit check list seen by the author. 
The typical employee check list contains items dealing with work output, 
quality, learning ability, job skills, personality, work habits, and many 
other types of items. Customary tests of item relevancy are generally 
applied to statements in such check lists, but these tests eliminate only 
those items which have a low or negative correlation for the group of 
persons under consideration. It is quite possible that items may show 
a high positive correlation within a group, but an individual may never- 
theless differ widely from the group tendency. For example, studies 

56 





A Fallacy in Use of Median Scale Values 57 


show a relatively high positive correlation between speed and accuracy 
of work. It is not uncommon, however, for an industrial supervisor or 
executive to challenge this finding on the basis that some of his workers 
show such great differences in speed and accuracy that the overall finding 
is untenable to him. 


Table 1 


Comparison of Two Types of Scale Values 
with Reference to Three Employees. 








Median Revised Employee 
Scale Scale 
item Value Value 


Is one of the best employees in the de- 8.6 3.6 
partment 

Has unusually good quality 8.4 3.4 

Carries through on all jobs 8.2 3.2 

Is extremely loyal 8.0 3.0 

Gives close attention to instructions of 7.8 2.8 
supervisor 

Plans work well 7.6 2.6 

Has Good judgment 7.4 

Learns new work easily 7.2 

Is enthusiastic 7.0 

Reacts favorably to corrections 6.8 

Starts work earlier than others 6.6 

Is a steady worker 6.4 

Gets help when in difficulty 6.2 

Profits from past mistakes 6.0 

Is pleasant and courteous 5.8 

Does fair share of work 5.6 

Does not alibi when corrected 5.4 


Total Score based on: Median Scale Value 8.0 7.6 7.0 
Revised Scale Value 21.0 28.6 34.0 





w 





re 
Aare 


PA PS Pd Pd 
HM OM MO 





For purpose of illustration, Table 1 gives seventeen items in order of 
their scale value as determined by one hundred supervisors. The items 
form the positive or favorable half of a scaled check list. They are all 
satisfactory for use in a check list so far as tests of relevancy and ambi- 
guity are concerned. 

Ratings of three hypothetical employees are given in columns headed 
A, B, and C. (It is assumed that none of these three persons has been 
checked on any items falling below the median scale value of 5.0.) The 
median scale values for the three employees are 8.0, 7.6, and 7.0 re- 
spectively. It will be noted that A is the “best’’ employee because he 
does not learn new work easily, is not enthusiastic, does not react favor- 
ably to corrections, etcetera! Employee C is the worst of the three 
employees because he possesses all the listed virtues and performs all the 
favorable actions! 











58 Clifford E. Jurgensen 


From a theoretical position it can be contended that the above findings 
will not commonly be found in actual cases if items are properly selected. 
However, the presence of the error of median scale score was originally 
found by the author when it was noticed that the “better” of two em- 
ployees obtained the lower of the two scores on a scale developed by the 
usual approved techniques. Other such cases were subsequently found. 
The decreased validity of the scale (whether large or small) is only one of 
the objections to the method. An even more serious objection is that 
the entire scale might fall into disrepute and discard if a few of the raters 
were to discover that overall scores would increase in magnitude if some 
of the favorable (but low value) items were not checked even if applicable 
to the employee being rated. 

A simple solution to the above fallacy is to replace median values by 
positive and negative values obtained by subtracting five from each of 
the item medians. (This assumes that scaling was based on nine equal- 
appearing intervals. The constant would differ for other numbers of 
groups.) The merit rating “score” for each employee is the algebraic sum 
of the revised weights for items checked as applying to the employee. 
For the three hypothetical employees referred to in Table 1, the revised 
scale scores would be 21.0, 30.0, and 34.0. It will be noted that the order 
of merit is the reverse of that obtained from the median scale value 
method, and that the revised order is consistent with logic. 

Previous discussion has been limited to median scale values. Exactly 
the same situation, however, is true for mean scale values. 

The above is proposed as a simple solution to the error of median 
scores. The scoring of scaled check lists on the basis of algebraically 
summed deviations is just as easy as use of mean scale values. Even 
though the validity of the scale may not be increased greatly (for the 
group as a whole) by this change in scoring procedure, scores of specific 
individuals sometimes change appreciably. The use of the inaccurate 
median (or mean) scale value does not appear defensible on logical grounds 
even though it seldom results in significant error. 


Received July 28, 1948. 
References 


1. Ferguson, L. W. The development of a method of appraisal for assistant managers. 
J. appl. Psychol., 1947, 31, 306-311. 

2. Knauft, Edwin B. Construction and use of weighted check-list rating scales for two 
industrial situations. J. appl. Psychol., 1948, 32, 63-70. 

3. Jurgensen, Clifford E. A nomograph for rapid determination of medians. Psy- 
chometrika, 1943, 8, 265-269. 

4. Richardson, M. W., and Kuder,G.F. Makingarating scale that measures. Person. 
J., 1933, 12, 36-40. 

5. Seashore, R. H., and Hevner, K. A time-saving device for the construction of 
attitude scales. J. soc. Psychol., 1933, 4, 366-372. 





An Empirical Approach to a Problem 
of Psychophysical Scaling * 


William H. Angoff 


Human Engineering Branch, Special Devices Center, 
Port Washington, New York 


Since Thurstone’s original work in the scaling of crimes by means of a 
paired-comparison procedure (2), numerous psychological judgments, 
including those concerning attitudes, have been ordered to continua in 
similar fashion with success. Specifically in industry, a scale has been 
developed for the quality of work performed by industrial supervisors 
(3). Another application in industry of the paired-comparison technique 
may conceivably be that of scaling jobs within an industrial plant. In 
this latter instance, job levels may be determined by the combination of 
scale values for factors attaching to a particular job, or they may be de- 
termined by the simple scaling of the jobs as a whole without regard to 
separate factors. In either instance, the jobs could be ordered to a single 
continuum which would then define the hierarchy. The theoretical 
defensibility and practical simplicity of such a job evaluation approach 
appears to constitute unquestionably an advantage over the procedures 
currently in use. However, there would appear to be a practical difficulty 
in the situation involved in job evaluation. Wherever job hierarchies 
are determined, it is frequently the case that new jobs are added or old 
jobs changed as the plant continues to function. If an entirely new scale 
of n items or jobs is developed each time the original items or jobs are 
altered or increased in number, the procedures of scaling, particularly 
where large numbers of judgments are involved, can become a very costly 
and time-consuming affair. It is therefore suggested that new items, 
whether they are jobs or other judgment-objects, may be inserted and 
placed, as they appear, in their proper positions in a scale that has already 
been set up and found to be satisfactory; and that a new rescaling of all 
items would not then be necessary. 

The present study attempts to duplicate in miniature such a situation 
as might obtain in an industrial plant, where a new item is added to a 
scale which has already been determined, and is presumably in use. 


* The author would like to express his gratitude to Dr. C. H. Lawshe and Dr. N. C. 


Kephart of Purdue University where the work was done for their advice and assistance 
in the preparation of the manuscript. 


59 














William H. Angoff 


Procedure 


A group of ten male movie actors were chosen who are well known 
to the public, and were used as object-choices. The subjects making the 
comparisons were twelve in number, and relatively advanced in terms of 
level of education, intelligence, and sophistication with regard to tastes 
in moving pictures. The ages of the subjects ranged from 26 to 36 years. 

Forty-five cards were prepared with every one of the ten names of 
the object-choices paired with every other of the remaining names. No 
pair occurred more than once in the deck of 45 cards. The stimulus- 
statement was prepared in advance and read by each of the twelve 
subjects prior to making his choices. The statement read as follows: “In 
the following pairs of movie actors, choose the one you would prefer 
to see in a moving picture. Use whatever basis you please for your 
decision.”” The choices for the 45 pairs were made separately and inde- 
pendently by each subject and recorded by the experimenter on the spot. 

It may be noted that no attempt was made in the experiment to assure 
uni-dimensionality or high reliability in the eventual scale. The movie 
actors chosen for the study are all current popular favorites, and it was 
expected that there would be much disagreement among the subjects 
with regard to their preference-choices—as indeed there was. Thus as 
much opportunity as possible was provided to permit the scale values to 
be affected by the withdrawal or insertion of items. Also, as was ex- 
pected, the range of scale values that resulted was narrow, permitting 
slight shifts in scale values to exert considerable effects upon the correla- 
tion coefficients that were to be computed. 

With regard to the question of uni-dimensionality, it was felt that 
while the concept is a highly important one in the usual scaling problem, 
it was a consideration not relevant to the problem here. The purpose of 
the present study was to manipulate preference-judgments as they were 
turned in by the judges. The particular manner in which the scale was 
constructed, and the assumptions underlying the construction of scales 
were felt to be matters for separate consideration. 

The choices having been made by all the subjects, a table of paired 
frequencies was drawn up, and a standard-score scale-value was deter- 
mined for each movie actor directly from the percentages. The per- 
centages represented the ratio of his ‘‘preferred”’ frequencies to the total 
possible ‘‘preferred’”’ frequencies. Constant values were added to each 
scale value to convert them to positive numbers, and finally a 10-item 
scale was constructed which then constituted the basic or “criterion” 
scale. 

The specific problem now involved deriving a nine-item scale con- 
sisting of all but one of the items. This nine-item scale would correspond 





Empirical Approach to Psychophysical Scaling 61 


to the scale referred to above that ‘“‘has been determined and is presum- 
ably in use.”” The tenth item, not included in the scale, would correspond 
to the new item which must be inserted into the pre-existing scale. The 
question arises: Can we have our judges make paired comparisons only 
between the new item and the nine old items in order to secure a scale 
value for this new item; or is it necessary to rescale all ten again? That is, 
will the information from n — 1—in this case, nine—paired-comparisons 
give as good a scale as the information derived from n(n — 1)/2—here, 
45—comparisons? 


Table 1 


Proportionate Frequencies of Preference of Row 
Object to Column Object 








Actor 
B C D E F 





H I Scale Values 





G 

250 .250 333. 333 ©6.333) 250 i= 190 
500 .500 .500 .583 . 500 .750 . ‘ 705 
500 .500 417 667 =. 833 .833 .500 . 962 
500 .583 .500 .583 .667 583 . 083. 986 
417 333 «4.417 «500 =—.583 Ct 750 «©.500 685 
583.333 .333Ct; 500. 667 .333—«; 559 
500 .167 417 .583 . d é ‘ F 791 
250 .167 .083 .250 333 =. 500. . 000 
583 .500 417 .500 .667 . 833 .500. . 875 
AIT 333) .250) 333) 333i; 833 333. 433 


A 
B 
C 
D 
E 
F 
G 
H 
I 

J 





By erasing one at a time the columns and corresponding rows in 
Table 1, ten new scales of nine items each were developed, each time, of 
course, omitting one of the actors. The scale-values in each of these new 
scales were then different from the scale-values in the criterion scale, since 
they had been constructed without consideration of the cells correspond- 
ing to the actor omitted in each instance. At this time the scale-value 
for the omitted actor in each scale was determined independently on 
the basis of the number of times he was preferred to the other nine. It 
is apparent that now his percentage value, and consequently his scale- 
value, was the same as in the criterion scale, since the same number and 
kind of comparisons were computed for him here as had been computed 
for the criterion scale. But since his relative scale status was changed 
because of the changes in the scale-values of the other nine, his relative 
scale-value was accordingly changed. 

The foregoing procedure of drawing out one object-choice at a time 
and re-inserting into the scale of nine was then modified to answer the 











62 William H. Angoff 


following question: If single-item insertion in a scale of nine results in a 
ten-item scale that is not substantially different from the scale that 
would have resulted had all ten items been originally considered at one 
time (i.e. the criterion scale), then how many items is it possible to insert 
in a scale before the new scale shows an appreciable departure from the 
criterion scale? To this end, six of the ten actors were chosen randomly 
and withdrawn in combination from the scale—first actor X, then actors 
X and Y together, then X, Y, and Z together, and so on. After each 
withdrawal, a scale was constructed of the remaining actors, and the 
withdrawn actors were then reinserted into the scale. 

There were two ways in which these insertions could be made. When 
r actors were withdrawn from the original set of n actors, a scale of n — r 
actors was constructed. In order to derive scale-values for the r actors, 
they could (a) be paired with each of the n — r actors—thus making 
r(n — r)' comparisons, or (b) the r actors could be paired with one another 
as well as with the n — r actors, thus making r(r — 1)/2 + r(n — r) com- 
parisons. Both of these procedures were carried out. 

To summarize, then, one criterion scale and three sets of so-called 
“derived” scales were developed: 


1. Criterion scale—all ten items used—n(n — 1)/2 = forty-five com- 
parisons. 

2. Single-item insertion into a previously established scale of nine 
items—n — 1 = nine comparisons. (Ten such scales.) 

3. Multiple-item-insertion into a previously established scale, making 
only r(n — r) new comparisons in each case. (Six such scales.) 

4. Multiple-item insertion into a previously established scale of n — r 
actors, making r(r — 1)/2 + r(m — r) new comparisons in each case. 
(Six such scales.) 


Results 


The following tables and figures are presented for reference: 

Table 1 is the original matrix of comparison-judgments showing the 
percentage of times the row-object-choice is preferred to the column- 
object-choice. Table 1 also gives the criterion scale, all values con- 
sidered positive, which was derived from the matrix. 

Table 2 presents the ten scales derived from the method of single-item 
insertion, all values positive. The correlations between each of the scales 
and the criterion scale appear at the foot of each scale. 


1 While the main diagonal of the percentage-preference matrix, corresponding to 
self-comparison of each item, was actually used in the construction of all these scales, 
it is not included in the discussion above. 





Empirical Approach to Psychophysical Scaling 


Table 2 


Single-Item Inserted Scales 


Note: The column headings refer to the actor who was withdrawn and re-inserted 
into the scale of the remaining nine. 








Actor Criterion Scale A B Cc D E F G H I J 





190 334 160 185 158 184 190 130 190 185 185 
705 836 699 682 655 686 762 655 645 705 659 
962 1072 966 929 966 948 978 844 904 969 921 
986 1098 1020 969 926 996 1002 942 904 969 921 
685 789 709 705 655 679 694 655 622 659 612 
559 * 673 523 566 539 568 583 539 506 566 473 
791 907 803 871 772 780 787 731 692 799 682 
000 000 000 000 000 000 000 000 000 000 000 
875 976 874 871 868 898 881 820 809 842 &24 
433 484 429 425 421 452 482 446 316 425 400 

r 995 .998 .995 .998 .999 .998 .995 .995 .999 .996 


Se moto Qm eS 





Tables 3 and 4 similarly present the scales derived from the method of 
multiple-item insertion. Table 3 gives the scales for the r(n — r) com- 
parisons, and Table 4 gives the scales for the n(n — 1)/2 + r(n — r) 
comparisons. Correlations between each of these scales and the criterion 
scale similarly appear at the foot of each scale. 

Figures 1, 2, and 3 are presented to illustrate graphically the results 
shown in Tables 2,3, and 4. Figure 1 isa graphical presentation of the 
appearance of each item on the scales of preference of actors. The 


Table 3 


Multiple-Item Inserted Scales 
(Scale Value for Each Inserted Item Determined 
on the Basis of r(n — r) Comparisons) 
Note: The column headings refer to the actors withdrawn and re-inserted. 








Actor Criterion Scale F. F.G. F.G.H. F.G.H.B. F.G.H.B.D. F.G.H.B.D.J. 





190 190 121- 118 058 000 000 
705 762 711 640 625 546 475 
962 978 844 759 775 774 696 
986 1002 954 852 884 795 685 
685 694 658 580 600 559 432 
559 583 557 502 444 406 263 
791 787 09-721 605 595 546 349 
000 000 000 000 000 008 047 
875 881 818 731 706 686 588 
433 482 502 370 355 350 306 

r F 988  .992 .988 .984 943 











William H. Angoff 


Table 4 


Multiple-Item Inserted Scales 
(Scale Value for Each Inserted Item Determined 


on the Basis of r(n — r) + 4. 


re) Comparisons) 
Note: The column headings refer to the actors withdrawn and re-inserted. 








Actor Criterion Scale F. F.G. F.G.H. F.G.H.B. F.G.H.B.D. F,.G.H.B.D.J. 


190 190 121 160 115 146 190 
705 762 =711 682 705 705 705 
962 978 844 801 832 920 886 
986 1002 954 894 941 986 986 
685 694 658 622 657 705 622 
559 583 517 559 559 559 559 
791 749 791 791 791 791 
000 000 ©6000 000 000 000 000 
875 881 818 773 763 832 778 
433 502 412 412 496 433 

r 998 .989 .990 .990 995 994 





Se mMoOBMseOoQwS 





1 CRITERION 
SCALE 


DERIVED SCALES 
0 . E F G 






































y “H H “H H Li “H “H H La] “1 


SCALE HEADINGS REFER TO THE ACTOR 
SCALE HEADINGS | © THE ACTOR WITHDRAWN AND REINSERTED INTO SCALE 


Fic. 1. Single-item inserted scales (see Table 2). 





Empirical Approach to Psychophysical Scaling 


CRITERION 
SCALE 


DERIVED SCALES 


0 FGH : F.GH,BO a 























j " 
H 

Lay 4 H “H A “A 

* 


SCALE HEADINGS REFER TO ACTORS WITHDRAWN AND REINSERTED. 


Fic. 2. Multiple-item inserted scales (see Table 3). 


criterion scale is presented here along with the separate scales derived 
from single-item insertion. Figure 2 gives the scales for multiple-item 
insertion where r(m — r) comparisons were made; and Figure 3 gives the 
multiple-item insertion scales when r(r — 1)/2 + r(n — r) comparisons 
were made. 

As may readily be seen from the tables and figures above, there is little 
doubt that substantially nothing has been altered in the construction of 
the “derived” type of scale. The scale resulting from inserting items 
into a pre-established scale differs negligibly from a scale developed with 
the use of all possible paired comparisons. The correlations between the 
“‘derived”’ scales and the criterion seale are, in every instance, .94 or 
over, even when the number of items inserted into the pre-established 














William H. Ango{} 


CRITERION 
SCALE 


DERIVED SCALES 
0 FGH GHB FGH.BD FGH.B0J* 























LH LH LH Ly Li. LH 
SCALE HEADINGS REFER TO ACTORS WITHDRAWN AND REINSERTED 


* 
Fic. 3. Multiple-item inserted scales (see Table 4). 


scale exceeds the number already in the scale. In all but one instance 
the correlations are .98 or greater. 


Conclusions 


Generally speaking, it appears that the smaller the number of items 
inserted, the higher will be the validity’? of the ‘‘derived” scale. It is 
felt that the validity is roughly inversely proportional to the percentage 
of items inserted to the number in the pre-established scale. Particularly 
when the scale is not a reliable one—as is probably true in the present 
case—insertion of more than 50% will tend to lower the validity of the 
scale beyond desirable limits. In the opinion of the author, the ratio, 

*In the usual sense of the term, “validity” does not strictly apply here. What is 
meant by “validity” is the correlation of a “derived” scale with the criterion scale. 





Empirical Approach to Psychophysical Scaling 67 


r/n — r, should be no greater than .50. The implication for job evalua- 
tion is that when fifty per cent of the present jobs are altered, or a cor- 
respondingly similar proportion of new jobs is added, a new scale of n 
items should be drawn up. Even here, it should not be necessary to 
make n(n — 1)/2 new comparisons. It would be sufficient to retain the 
(n — r)(n — r — 1)/2 old comparisons, and to add to that r(r — 1)/2 
+ r(n — r) new comparisons in order to build a new matrix and scale 
from the total of n(n — 1)/2 comparisons. 

In general, the greater the number of judgments possible for the r 
items, the higher will be the validity of the ‘‘derived” scale. That is, 
when the r new items are compared one with another as well as with the 
n — r old items, higher correlations result between the ‘“‘derived”’ scale 
and the criterion scale. 

It appears from this study that much can be done in the way of modi- 
fying the construction and use of the paired-comparison scale without 
altering appreciably the units along the scale. It is felt that such sta- 
bility deserves further investigation of the paired-comparison technique. 
Unfortunately, as the number of object-choices increases, the number 
of paired judgments increases so rapidly that the seale falls down under 
its own weight. Additional work is needed, then, to test further the 
modifiability of the technique in order to permit a wider range of applica- 
tion. 

The applications of this kind of modification in technique are fairly 
numerous. In the construction of attitude scales, for example, it has 
often been experienced by workers in the field that attitude statements, 
while meaningful during a particular period or for a particular group of 
subjects, lose their applicability and meaning with the passage of time 
or with a change in the characteristics of the group measured. It is at 
that time necessary to delete items from the original scale, and some- 
times necessary to add new ones. It is apparent that any change in an 
item of the scale will change the complexion of the rest of the items in 
the scale. The question to be answered, then, is whether or not the 
scale resulting from the change in one or more items is sufficiently large 
to warrant an entirely new scaling of all items. To a considerable ex- 
tent the present study answers such a question. If the empirical findings 
here continue to obtain, this type of manipulation with the items of a 
scale that has been derived by means of a paired-comparison technique 
can become quite extensive before the derived scale can be considered 
invalid. 


The implications of this technique of “derived” scales for industry 
are fundamental from a more general point of view. Fortunately or 
unfortunately, the industrial situation seldom meets the rigorous as- 








68 William H. Angoff 


sumptions involved in statistical techniques that are developed on the 
statistician’s desk or in the laboratory. Particularly in the industrial 
situation where personal satisfactions and benefits depend so heavily on 
the assignment of a rating or judgment, is it important that the judgments 
be made with greatest regard for precision and care. Ideally the pro- 
cedures adopted for use should conform with the procedures found to be 
most reliable in the laboratory. However, to the extent that practical 
considerations make impossible the use of orthodox scientific techniques, 
modifications must be introduced to conform with what is practicable. 
Still, from the point of view of scientific awareness alone, if nothing else, 
it is similarly necessary to know precisely what is the extent of reduction 
in the validity and reliability of a measuring instrument or technique as 
a result of modifying the orthodox procedures. It is only when he is 
equipped with such knowledge that the psychologist can deal with his 
data in industry with any real assurance. 


Received August 2, 1948. 


SRT aT REP a gL RP E R ET 


cacabats 


References 


1. Guilford, J. P. Psychometric methods. New York: McGraw-Hill, 1936. 

2. Thurstone, L. L. The method of paired comparison for social values. J. abnorm. 
soc. Psychol., 1927, 21, 384-400. 

3. Uhrbrock, R. 8S., and Richardson, M. W. Item analysis. Person. J., 1933, 12, 
141-154. 





\ 
} 
‘ 
t 
: 





The Paired Comparison Technique for Rating Performance 
of Industrial Employees 


C. H. Lawshe, N. C. Kephart, and E. J. McCormick 


Occupational Research Center, Purdue University 


The method of paired comparisons has been used occasionally for 
making subjective ratings of job performance but has not been commonly 
adopted for this purpose, presumably because of certain disadvantages 
involved in its usual application. These disadvantages especially center 
around two factors: first, the time required, including the preparation of 
the pairs of names of the subjects, the actual rating process, and the sum- 
marizing of the results; and second, the rating process has been con- 
sidered wearying to the raters, particularly if a considerable number of 
individuals are to be rated. 


The Personnel Comparison System ! 


The Personnel Comparison System provides for the rating of job per- 
formance by the paired comparison technique, but the mechanics of its 
administration were specifically designed to simplify the various pro- 
cedures. 


The system lends itself to rating any aspect of employee performance, 
although in most of its applications it has been used for rating over-all job 
performance. The cue for the use of this basic factor as a measure of job 
performance is derived from such studies as that of Ewart, Seashore, and 
Tiffin (1) which brings out high degrees of communality among factors 
typically ‘‘measured”’ on rating scales. These authors identified the 
factor ‘Ability to do the present job’? which accounted for most of the 
variability of the ratings. 

The Personnel Comparison System provides the rater with a booklet 
composed of slips of paper about one inch wide and six inches long. See 
Figure 1. Each slip contains one pair of names. To facilitate prepara- 
tion, eight slips are initially arranged on one 83 by 11 form. Pairs of 
names are typed on each slip and the slips are later separated by tearing 
along perforated lines. 

1The Personnel Comparison System for Rating Employee Performance, Copyright 


1948 by C. H. Lawshe and N. C. Kephart, is available from Mayer and Company, 
15 East Eighth St., Cincinnati 2, Ohio. 


69 











C. H. Lawshe, N.C. Kephart, and E. J. McCormick: 


Fig. 1. Method of marking pairs in the rating booklet made with the 
Personnel Comparison System materials. 


The procedures involved in the administration of the system and the 
subsequent scoring of results follow: 


1. The names of individual pairs are typed on the separate sections of 
the forms according to a pre-determined order which is presented in table 


form. The table provides for pairing each employee with each other 
employee. 


2. The sections are separated on the perforations and the slips are 
assembled into a booklet by means of a paper fastener inserted through 
prepared holes. 

3. The rater checks the preferred name on each slip. 

4. The number of times each individual is preferred is tallied on a 
summary sheet. 

5. A performance rating index is derived from a table,’ the specific 
index being determined by the number of times each individual was 
preferred and the number of individuals being rated. 


2 The indexes in the table are based on the proportion of times each individual is 
preferred, converted to standard score units. These units are based on a mean of 50 
with a standard deviation of 10. Indexes range from approximately three standard 
deviations below the mean to approximately three standard deviations above the mean 
(actually from 23 to 77). 





Paired Comparison for Rating Performance 


Application of the System 


For the purpose of experimentally applying the Personnel Comparison 
System in an operating situation, arrangements were made with a paper 
form manufacturing company to rate employee performance in two 
selected departments. This experimental tryout was directed toward 
the establishment of a criterion for the validation of personnel tests, 
rather than as a merit rating procedure. The raters were asked to rate 
the individuals with the following question in mind, ‘Which of these two 
employees is performing his present job better?” The two departments 
in which the system was tried, and the specific provisions for the applica- 
tion of the system in each, are given below. 


1. Offset press department. ‘Twenty-four of the offset pressmen who 
were oldest in point of service were rated by three supervisors and an 
instructor. All four raters had had an opportunity to become familiar 
with the werk of each pressman through the systematic rotation of the 
pressman from one shift to another. One booklet including all pairs of 
employees was provided for each rater. Ratings were made _ inde- 
pendently. 

2. Slereo press department. Fight stereo pressmen on each of three 
shifts were rated. These 24 men had had five or more years of experience 
on the job. While each man had previously been rotated between all 
three supervisors, they were classified in terms of their present shift 
position and the men on each shift were divided into random halves, 
called, 1-1, 1-2, 2-1, 2-2, 3-1, and 3-2. 


On one day, each of the three supervisors (designated A, B, and ©) 
rated those men then on his shift. On the next day, each supervisor 
rated the same men along with one-half of the men then on each of the 
other shifts. The groups rated by the three supervisors on the first and 
second days are indicated in Table 1. In addition, an instructor (Rater 
D) rated all of the 24 men. 


Table 1 


Subgroups Rated by Kach of Three Supervisors on Two Days 





Supervisor Making Rating 


Group First: Day Second Day 


A i C 
A ! B 
B d B 
B B 
, B 
C 


won — 


ww 














= neh CER UES Tika See 


C. H. Lawshe, N. C. Kephart, and E. J. McCormick 


Results of Offset Pressmen Study 


The first study, involving the 24 offset pressmen, was conducted to 
determine the reliability of the ratings of different raters. 

The agreement between the four raters is shown in Table 2. This 
table shows the number and per cent of pairs in which the same individual 
was preferred by all four raters; the number of pairs in which three raters 
chose the same man; and the number of pairs in which the raters split 
two-to-two. Of the 276 different pairs rated, all four raters preferred 
the same individual in 227, or 82.3 per cent, of the pairs. In 36 pairs, 
or 11.1 per cent, three raters preferred the same individual. In the 13 
remaining pairs, or 4.7 per cent of the total, two of the raters preferred 
one of the individuals, and the other two raters preferred the other 
individual. 


Table 2 


Distribution of Preferences of Four Raters on Pairs of Twenty-four Offset 
Pressmen by Number and Per Cent of Pairs 





Dist asin of 
Preferences of Four 
Raters on Pairs of No. of Per Cent 
Employees Pairs of Pairs 





{ 227 82.3 


4- 
3-1 36 13.0 
2-2 13 4.7 


276 100.0 


Intercorrelation of Ratings. Further analysis of the agreement. among 
the four raters was accomplished by means of an average intercorrelation 
coefficient of the rank orders of the 24 men as resulting from the ratings 
of each of the four raters; the resulting av age intercorrelation coefficient 
of the four rank orders was .94. 

Reliability of Ratings on Halves and Quarters. In order to examine 
possible differences in reliability that would result from the rating of 
smaller groups of the same employees by the four raters, average inter- 
correlations were also computed for chance halves and chance quarters of 
these 24 offset pressmen. The two chance halves inciuded odd-numbered 
and even-numbered employees respectively, the numbers having been 
assigned by alphabetical order of names. The chance quarters, in turn, 
were made up of every fourth name in the list in the same fashion. Only 
the preferences on pairs of employees included in the particular chance 
half or chance quarter in question were considered. Within each such 
group the number of times each employee was preferred was tallied, and 





Paired Comparison for Rating Performance 73 


rank orders of the men in each group were subsequently determined. The 
average intercorrelations, computed by the rank-order method, are given 
in Table 3. These average intercorrelations closely approximate the 
coefficient of .94 obtained with the whole group. Even the correlation 
of .85 can reasonably be considered as satisfactory since only six men are 
involved. 


Table 3 


Average Intercorrelations of Rank Order of Times Preferred of Chance Halves 
and Chance Quarters of Twenty-four Offset Pressmen 








Average. 
Group Intercorrelations 





Chance halves 
Ist half .96 
2nd half .93 
Average of 2 halves .94 


Chance quarters 
lst quarter 97 
2nd quarter 85 
3rd quarter 93 
4th quarter .94 
Average of 4 quarters 92 





Reliability of Ratings on Restricted Range Group. A further analysis 
of this same character was made with respect to a selected group of the 
24 pressmen representing a restricted range of talent. The overall group 
included three floormen (working supervisors), thirteen “‘A”’ pressmen, 
seven “B” pressmen, and one helper. The 13 “A” pressmen (who 
operate somewhat more complex offset presses) were selected from the 
group for separate analysis, and the number of times each of these was 
preferred over the others within this same group was tallied. The re- 
sulting average intercorrelation of the rank orders of this group was .79. 

This reduction in average intercorrelation from that of the overall 
group and from those for the chance halves and chance quarters would 
be expected since the group of “‘A’’ pressmen was much more restricted 
in its range of talent, and, generally speaking, tended to fall within the 
central and above-average (though not extreme top) range of the distri- 
bution of the entire group. The floormen consistently were rated above 
the ‘‘A” pressmen, and to a considerable extent the ‘‘B” pressmen and 
the helper tended to be rated toward the lower end of the over-all group. 

The ratings of these 13 ‘“‘A” pressmen were then subjected to a dif- 
ferent type of analysis. The relative rank orders of these 13 men were 
“extracted” from the rank orders of the entire group; they were then 











74 C. H. Lawshe, N. C. Kephart, and E. J. McCormick 


compared with the rank orders resulting from the preferences on only the 
pairs of men in this sub-group. The rho correlation between these two 
rank orders was .996, indicating that there was practically no displace- 
ment in rank-order position among these 13 men when their rank order 
was derived from the ratings made exclusively on this group, as compared 
with their relative rank orders when “extracted” from that of the whole 
group. 
Results of Stereo Pressmen Study 

As indicated before, the eight stereo pressmen on each shift were split 
into chance halves. On one day each supervisor rated the eight men 
together, and on the subsequent day each supervisor rated the same 
eight men along with one of the halves of each of the other shifts. The 
instructor rated all 24 men on one occasion. 

In order to determine the correlations between subsequent ratings on 
men rated twice by the same supervisor, or on ratings by two or more 
raters on men rated in common, only the pairs of names pertinent to any 
such specific analysis were used in tallying the number of times each man 
was preferred. The rank-difference correlation coefficients (rho) between 
the several combinations of ratings are given in Table 4. 


Table 4 


Rank-Difference Correlations (Rho) of Various Ratings on 
Twenty-four Stereo Pressmen 








Groups No. ‘of Men Coefficient of 
Rater Rated in Group Correlation (Rho) 





First and Second Ratings by Each of Three Supervisors 
A 1-1, 1-2 8 .98 
B 2-1, 2-2 8 1.00 
Cc 3-1, 3-2 8 94 


Average 97 


Ratings by Two Different Supervisors 


A&B 1-2, 2-1 8 81 
A&C 1-1, 3-2 8 83 
B&C 2-2, 3-1 8 .86 

Average 83 


Ratings by Each of Three Supervisors and One Instructor 
A&D 16 


B&D 16 
C&D 16 


Average 








Paired Comparison for Rating Performance 75 


Reliability of Two Ratings by Three Supervisors. The initial analysis 
of the ratings of the stereo pressmen was that of the reliability of the 
two ratings made by each of the three supervisors of the eight men who 
were then under their respective supervision. The rank-difference cor- 
relations (rho) between the two ratings made by each of the supervisors 
ranged from .94 to 1.00, with an average of .97, which reflects a highly 
satisfactory degree of consistency between the ratings. 

Reliability of Ratings Among Three Supervisors. As indicated above, 
eight men were rated in common by supervisors A and B, eight others 
were rated in common by supervisors A and C, and eight others were 
rated in common by supervisors B and C. The rank-differeuce correla- 
tions between the two ratings of each of these three groups ranged from 
.81 to .86, with an average of .83. While these coefficients between 
ratings made by different supervisors are somewhat below the coefficients 
of the two ratings made by the same supervisors on men whom they 
rated on successive days, they can nevertheless be considered as re- 
flecting an adequate degree of consistency among the three raters. 

Reliability of Ratings Between Three Individual Supervisors and One 
Instructor. Each supervisor rated 16 men, while all 24 were rated by 
the instructor. The‘rank-difference”correlation coefficients between the 
ratings of each of the supervisors and the ratings of the instructor ranged 
from .83 to .90, with an average of .87. 


Table 5 


Average Intercorrelations of Ratings by Three Raters of Three 
Groups of Eight Stereo Pressmen 








Avera : 
Raters Intercorrelation 





A, B, D 84 
A, C, D .76 
B, C, D 87 

Average 82 





Reliability of Ratings of Three Raters. Since each of three groups of 
eight stereo pressmen was rated by two different supervisors and by the 
instructor, it was possible to determine the average intercorrelations of 
the rank-orders resulting from the three ratings on each of these groups 
of eight men. These average intercorrelations were .76, .84 and .87, the 
average of the three being .82. (See Table 5.) This average is lower 
than the average of the other measures of reliability previously men- 
tioned, but is within the same relative range as those of the other meas- 
ures of reliability. 











C. H. Lawshe, N. C. Kephart, and E. J. McCormick 


Administration of Rating System 


Time Required for Administration. The time required for applying 
the rating system to the 24 offset pressmen may give a rough indication 
of the practical feasibility of the system in somewhat comparable cir- 
cumstances. It was estimated that it took a total of 12 hours to type 
the slips for the 276 pairs (including carbon copies for the four raters), to 
assemble the four booklets, to rate the workers, and to derive the rating 
indexes. This time did not include planning, conference, or administra- 
tive time, but did include the time required for the rating by all four 
raters. In view of the fact that time required for functions such as 
typing and separation of the slips does not increase proportionately with 
the number of different raters, the over-all time is not indicative of the 
time that would be required if the rating were done by one rater rather 
than by four. It is estimated that the time required to prepare material 
and to summarize results for a complete rating of the 24 men by one rater 
would be about five or six hours. 

The actual time required for each rater to rate the 276 pairs, however, 
was only about 30 minutes. This time required for actual rating is 
sufficiently reasonable to raise a question about the comments made by 
Guilford (2) and made in the report of the National Industrial Conference 
Board (3) to the effect that the method of paired comparisons is, by its 
nature, excessively wearying to the raters. More specifically, there is 
reason to doubt the limit of 15 subjects implied by Guilford as the upper 
limit of the practical application of the technique. Perhaps the me- 
chanics of the specific scheme provided for making the ratings have a 
significant bearing on the degree to which the system is acceptable to the 
raters, and consequently on the total number of subjects that can reason- 
ably be rated by one individual. 

In considering the over-all time required for all the processes there was 
no suggestion that this time was considered excessive by the company 
applying the system to these two groups of workers. 


Summary and Conclusions 


Two groups of 24 workers each were rated by the paired comparison 
technique using the Personnel Comparison System. One of the groups 
included 24 offset pressmen who were all rated by three supervisors and 
one instructor. The other group included 24 stereo pressmen, eight 
from each of three shifts; each supervisor rated the eight men on his 
own shift on one day, and on the next day he rated the same men along 
with one-half of the men on each of the other shifts, making a total of 
16 men. An instructor rated all 24 stereo pressmen once. 





Paired Comparison for Rating Performance 77 


Analyses of the resulting ratings brought about the following primary 
conclusions: 


1. There was a high degree of reliability between the ratings of two or 
more raters who rated the same employees. 

2. There was a high degree of reliability between successive ratings, 
made on different days by each of three raters, on the employees whom 
they individually supervised. 

3. The analysis of the ratings of a selected subgroup of employees 
revealed very little relative displacement in their rank-order position 
derived from the ratings on only the selected employees, as compared 
with their relative rank-order positions “extracted” from the ratings of 
the larger group of which they were a part. 

4. The evidence accumulated did not indicate that the time required 
of raters was excessive. 


Received November 24, 1948. 
Early publication. 


References 


1. Ewart, E., Seashore, S. E., and Tiffin, J. A factor analysis of an industrial merit 
rating scale. J. appl. Psychol., 1941, 25, 481-486. 

2. Guilford, J. P. Psychometric methods. New York: McGraw-Hill Book Company, 
Inc., 1936. 

3. Employee rating; methods of appraising ability, efficiency, and potentialities. National 
Industrial Conference Board, Studies in Personnel Policy No. 39, 1942. 














Flesch Count and Readership of Articles 
in a Midwestern Farm Paper * 


Howard B. Lyman 
East Texas State Teachers College, Commerce, Texas 


A preliminary study of the readers of Wallaces’ Farmer and Iowa 
Homestead suggested in March, 1946 that reducing the Flesch count of 
articles from 3.5 to 1.5 might substantially increase the number of sub- 
scribers reading that article. To investigate this clue, a similar survey 
was set up in November, 1946. 

The state of Iowa was divided into alternating counties, designated as 
“A” and “B” for the purposes of this report. The editor reveals that 
there may have been some sectional bias in the results, inasmuch as the 
“‘A”’ group of counties were a little heavier towards the southwest and 
the ““B” counties heavier to the northwest. 

Papers for November 16, 1946 were run off with four articles printed 
in alternate forms (one with a Flesch count of approximately 3.5, the 
other with a count of approximately 1.5', two of the difficult and two 
of the easy forms appearing in each copy of the issue. Typography, 
illustrations, leads, subject matter, and position of the articles were 
identical; only the difficulty level was varied. The experimental copy 
was distributed to all subscribers in the ‘‘A”’ counties, the control copy 
to all subscribers in the “B’” counties. An excerpt from both forms of 
Article 4 (Nylons) is given in Figure 1. 


Lower Flesch Count Version 


Edna, my neighbor, was lucky. 
She has a big family. In 1940, she 
bought a pretty green nylon and wool 


Higher Flesch Count Version 


Nylon doesn’t always mean just a 
precious pair of sheer stockings any 
more. It can mean any number of 


coat for Bonnie, her eldest daughter. 

Bonnie wore the coat for two 
years. Then, when she became a 
war bride, she got a new coat that 


bright, new garments that are made of 
nylon. 

There are blouses, slips, children’s 
clothes, coats and such things as cur- 


would match her wedding suit. 
Fia. 1. 


tains, rugs, and upholstery materials. 
Introductory paragraphs from Article 4 (Nylons). 


*From data collected and processed by the Farmer-Homestead Poll and made 
available to the writer by Donald B. Murphy, Editor of Wallaces’ Farmer and Iowa- 
Homestead, under whose direction the surveys were made. The writer of this article 
has merely prepared the data for publication in this journal, since he feels it suggests a 
method of interest to psychologists. Murphy has previously reported the results in an 
advertising trade journal (3, 4). ; 

1 Flesch counts (1) computed by the United States Department of Agriculture 
Readability Unit. = 





Flesch Count and Readership of Articles 79 


From nine to thirteen days after publication, interviewers were sent 
out with instructions for obtaining a random sample of subscribers (‘‘Go 
to first cross-roads north, turn to the right, call at every third farm’’). 
If the farmer was a subscriber, he was asked: ‘Did you HAPPEN to see 
or read anything on this page?” Table 1 shows the final size of the 
sample after eliminating non-subscribers. 


Table 1 
Size of Subscriber Samples in “A” and “B” Counties 








“A” Counties “B”’ Counties 





Male ‘ 73 76 
Female 75 83 158 





Total 148 159 307 





A quantitative score for each article was obtained by the following 
system: a rating of 1 if the respondent indicated reading one-quarter or 
less of the article; 2, if one-quarter to one-half; 3, if one-half to three- 
quarters; and 4, if three-quarters to total. By using this system and cor- 
recting for difference in size of N and for variation in areas, it was possible 


to determine the per cent of greater readership for the articles with the 
lower Flesch count. These facts are presented in Table 2. The female 
scores for articles 1 and 3 and the male scores for article 4 were thrown 
out because of the small N’s. 

No figures on significance are reported, several statisticians having 
stated that the data are not amenable to any of the standard tests; how- 
ever, after deriving the corrected quantitative readership score, it was 
found that four articles showed a positive difference (i.e., an increase in 
readers for the lower Flesch count) and one a negative difference. The 
low count version of the editorial (Article 2) showed a 9.4% decrease in 
readership for the women. Increases in readership for the other low 
count articles ranged from 7.3% to 66.0%. 

Wallaces’ Farmer reports that by deliberate attempts to keep copy 
more readable, they have been able to lower the range of most articles to 
between 1.5 and 4.0. Prior to this policy, articles had ranged from about 
3.0 to 6.0. Routine reader surveys have shown consistent increases in 
readership, and the lower Flesch counts are considered at least a con- 
tributory factor to this increased popularity. 

In the opinion of the writer, even better results may be anticipated 
from the use of the new Flesch readability yardstick (2). The old 














Howard B. Lyman 


Table 2 


Difference in Readership Scores for Articles When Copy is 
Simplified by Use of Flesch Principles 








Flesch No. of Raw Readership* 
Count Readers Score 


¥ 
* 


Subject County 





Hogs 1.5 40 49.3 


3.85 50 57.9 
1.5 +s + 
3.85 ae ** 


1.76 36 
4.27 24 


1.76 23 
4.27 23 30.7 


1.35 51 65.8 
3.47 37 


1.35 ** * 
3.47 * * 


1.11 * * 
2.48 ae ** 


1.11 44 52.3 
2.48 40 43.7 








Editorial 

















Brel wPlr wero rel rw oe Pe| oP 
aaleeisslee|sslee| see 





* After correction of the scores to make them comparable for the two groups, one 
low Flesch count article (2B Males) showed a loss of 9.4% in readership. Two others 
(1 and 3, Females) were dropped because of the small N. The other four showed in- 
creases ranging from 7.3% to 66.0%. 

** Not computed, since N was less than 20. 


formula yielded an ambiguous index in which difficulty and interest were 
combined. The new formula is not only simpler to apply but it measures 
difficulty and interest separately. 


Received June 23, 1948. 


References 


1. Flesch, R. The art of plain talk. New York: Harper and Brothers Publishers, 1946. 

2. Flesch, R. A new readability yardstick. J. appl. Psychol., 1948, 32, 221-233. 

3. Murphy, D.R. Test proves short sentences and words get best readership. Printer’s 
Ink, 1947, 218, 61-64. 

4. Murphy, D. R. How plain talk increases readership 45% to 66%. Printer’s Ink, 
1947, 220, 35-37. 





Speed of Reading Nine Point Type in Relation 
to Line Width and Leading * 


Miles A. Tinker and Donald G. Paterson 
The University of Minnesota 


The writers have previously reported the optimal limits for good read- 
ability within which line width and leading may be varied for 6 point, 
8 point, 10 point, 11 point, and 12 point type sizes.'_ The results appear 
to be specific for each type size. Nine point type was found to be as 
readable as 10 point, 11 point, and 12 point type when each was printed 
with two point leading, and with its own optimal line width. 

It was believed to be important to establish for 9 point type the same 
information previously reported for each of the five type sizes mentioned 
above. The purpose of the present study, therefore, is to determine the 
influence of variation of line width and leading for 9 point type. The 
method used was the same as in previously reported studies.! 

A table giving detailed tabulated results for the twenty test groups 
of 100 sophomore laboratory students each is on file with the American 
Documentation Institute.2 This table is not reproduced here because 
of its excessive size and detail which would be of primary interest only 
to the research scholar. 

Table 1, however, presents in convenient summary form a guide to be 
followed by those who desire to specify the optimal limits of variation in 
line width and leading when nine point type is to be used. 

In setting up the study, we used, as a standard, material printed in 
18 pica line widths with 2 point leading. The line width variations and 
the leading variations shown in Table 1 were each compared in turn with 
the standard. The differences are shown as percentage increases or de- 
creases (minus sign) in speed of reading. For example, the test material 
printed in an 8 pica line width, set solid, was read 9.5 per cent more slowly 
than the standard, whereas the test material printed in the same short 
line width with 1 point leading was read 4.8 per cent more slowly than the 
standard. Other entires in Table 1 are to be interpreted in a similar 
manner. 

* Grateful acknowledgment is given to the Graduate School, University of Minnesota, 
for research grant to finance this study. 

1 Paterson, D. G., and Tinker, M. A. How to make type readable. New York: 
Harper and Brothers, 1940. (Obtainable from the writers.) See Chapter 7, pp. 72-81. 
Also see Appendix I, Methodology, pp. 161-189. 

? This table is available as ADI Documents in the form of microfilm (images one inch 
high) on standard 35 mm. motion picture film, or photoprints (6 X 8 inches in size) 
readable with unaided eyes. To secure this table order Document 2626 remitting $0.50 


for photocopy or microfilm from American Documentation Institute, Science Service 
Building, 1719 N Street, N.W., Washington 6, D. C. 


81 














Miles A. Tinker and Donald G. Paterson 


Table 1 
Simultaneous Variation of Line Width and Leading for Nine Point Type 

Note: Reading speeds for 8, 14, 18, 30 and 40 pica line widths each set solid and 
leaded 1 point, 2 points and 4 points are compared (percentage differences) with reading 
speed for Scotch Roman printed in 18 pica line width leaded 2 points as a standard. 
Minus (—) differences indicate slower reading than the standard. Figures in bold face 
indicate extremely unsatisfactory typographical arrangements. Number of readers 
= 2000 university sophomores. 








Line Set 1 Point 2 Point 4 Point 
Width Solid Leading Leading Leading 





8 —9.52 —4.75 —5.76 — 6.78 
14 —4.39 0.68 0.46 1.30 
18 —2.72 0.23 0.00 3.24 
30 —5.17 —0.45 2.43 0.40 
40 — 5.83 —3.97 —5.81 —2.57 





Examination of Table 1 shows that the region of optimal legibility 
ranges between a 14 pica line width with 1 point leading or more to a line 
width of about 30 picas with 1 point leading or more. 

All differences amounting to a 4 per cent or greater decrease in legi- 
bility are indicated by the use of bold face type. Such differences are 
significant beyond the 1 per cent level. This permits ready identification 
of typographical arrangements that should not be used. The 2.72 per 
cent decrease in reading rate for 18 pica line width set solid is significant 
at about the 3 per cent level. While this arrangement may be used 
without a large retardation in reading rate, it isnot recommended. The 
same is true of the 2.57 per cent decrease in reading rate for 40 pica line 
width with four point leading. 

As was true with the other type sizes studied and previously reported, 
one can specify line widths for 9 point type over a considerable range (in 
this instance, from 14 to 30 picas) provided one to four points of leading 
are used. Conservative practice would probably specify one or two point 
leading for 9 point type in line widths varying from 16 picas to 24 picas. 
Our studies of reader preferences show that readers dislike long lines and 
very short lines. 

Summary 


The present study was carried out to determine the influence of line 
width and leading on the speed of reading 9 point type. 

The results indicate that optimal rate of reading occurs with line 
widths of 14 to 30 picas and with 1 to 4 points leading. This may be 
considered the zone of safety. 

A conservative range would be 16 to 24 pica line width with 1 or 2 
points leading when 9 point type is used. 


Received June 10, 1948. 





Effect of Target Brightness on “‘Normal’’ and 
“Subnormal”’ Visual Acuity * 


James E. Kuntz ** and Robert B. Sleight { 
Division of Education and Applied Psychology, Purdue University 


Ferree and Rand (1) and Ferree, Rand, and Lewis (2) have described 
the influence of illumination on the visual acuity of a few persons of 
widely divergent ages and with greatly varied visual abilities. They 
concluded (2): ‘“‘Lighting practice has been conventionalized much too 
narrowly with respect to intensity of light.’”’ Tinker (3) believes that 
the studies referred to above “. . . suggest a moderate increase in 
illumination for those with corrected vision as compared with normal 
eyes.” 

The purpose of this experiment was to investigate this problem further 
by comparing the performance of a group of people with “subnormal”’ 
visual acuity with a group having ‘“‘normal’’ visual acuity on a task of 
visual discrimination under varying brightness levels. 

In this experiment.those persons were considered subnormal who dem- 
onstrated a visual acuity below 1.0 and those considered normal who 
had a visual acuity above 1.0 in decimal notation when measurements were 
made at a distance of 28 inches and with a brightness level of ten foot- 
lamberts on the same test as used in the experiment reported on in this 
paper. There were 12 Ss in the subnormal group and 12 in the normal 
group. 

Six brightness levels were used, viz., 3.16, 10, 31.6, 100, 316, and 1000 
footlamberts. These conform in log terms to 10°, 10', 10'5, 10°, 10?-, 
and 10%. 


*This research was supported by a subcontract between the Purdue Research 
Foundation and The Johns Hopkins University. The subcontract was part of Con- 
tract N5-ori-166, Task Order I, Project Designation Number NR-784-001, between 
Special Devices Center, Office of Naval Research, and The Johns Hopkins University. 
This article is Report No. 166-I-67 under that contract. The authors wish to express 
appreciation for the advice given by Drs. N. C. Kephart, L. M. Baker, and J. A. Bromer 
in the planning of this experiment and the preparation of this report. 

** Present address: Division of Education and Applied Psychology, Purdue Uni- 
versity, Lafayette, Indiana. 

+ Present address: Psychological Laboratory, The Johns Hopkins University, Insti- 
tute for Cooperative Research, Baltimore 2, Maryland. 


83 














James E. Kuntz and Robert B. Sleight 


Apparatus and Acuity Targets 


Figure 1 is a schematic diagram of the apparatus used in this experiment. 
Illumination was provided by electric lamps (L) varying in wattage from 15 
to 500 mounted on a frame 52 inches square. The lamps were arranged so 
that glare in the visual field was almost entirely eliminated. In the center of 
the frame was an opening 6 inches square into which was fitted an eye shield 
(ES). The subject sat on seat (S) and read the target (T) at a distance of 
28 inches through this opening. A — shield (LS) was provided to eliminate 
direct light from the lamps striking the S’s eye. A hood (H) was also used to 
shield the subject from light reflected into the surrounding room. 


L ZL 


eR 






































Ss 


Fic. 1. Schematic diagram of apparatus used for the study of visual acuity (T = 
Acuity target, B = Background surface, LS = Light shield, ES = Eye shield, H = Hood, 
S = Seat, L = Lamps, SW = Switches, C = Constant voltage transformers, R = 
Rheostats). 


The background (B) consisted of fine-grain wood covered with several coats 
of nonspecular white paint and extended to the limits of vision. The checker- 
board target (T) was mounted on a piece of cardboard 6 inches square. The 
— was of approximately the same texture and albedo as the back- 
ground. 

Constant voltage transformers (C) were used in the lines. The MacBeth 
Illuminometer was used daily, with measurements made at the position of the 
S’s eye, to insure correct brightness levels. Brightness levels were readily 
changed by a series of switches (SW) which controlled the lamps of various 
wattage. Rheostats (R) were used to make very minor adjustments in 
brightnesses. 

The acuity targets were of the type used in the Bausch and Lomb Ortho- 
Rater for measuring far and near acuity. Each target consisted of a checker- 
board to be discriminated from gray areas in order to locate it in one of four 





Effect of Target Brightness 85 


ossible positions. The task consisted of locating the position of the checker- 
Goaed in a series of such targets progressively diminishing in size. The actual 
size of detail to be discriminated was as follows: 0.0135, 0.0102, 0.0080, 0.0067, 
0.0058, and 0.0050 inches. (These sizes in terms of visual angle are 1.66, 1.25, 
0.98, 0.82, 0.71, and 0.61 respectively.) The decimal acuity notations were 
determined by calculating the reciprocal of the visual angle subtended by the 
task object in each acuity target. The decimal notations which corresponded 
to each of the above targets were: .6, .8, 1.0, 1.2, 1.4, and 1.6 respectively. 


Procedure 


Targets were presented in a sequence designed to help eliminate the influ- 
ence of different degrees of motivation, and to cancel the effects of learning 
and fatigue. The starting points (levels of brightness) for the Ss were rotated 
so that in each group two people began the experiment at each level and 
continued “up” the brightness scale until the highest brightness was reached. 
Those who did not begin at the lowest level then went to the lowest level and 
continued ‘‘up” to the point of beginning. There were 10 randomized presen- 
tations with regard to position for each target. If the 5 made five or more 
correct responses out of the ten target presentations, E then proceeded to 
administer the next smaller target until a target size was reached on which 8 
made less than five correct responses in ten trials. All Ss were required to 
make a response for each target presentation. The succeeding target presen- 
tations were made in approximately 3 seconds after each response with the 
brightness constant at the level being used at the time. There was no time 
limit for making responses. Two minutes were allowed for adaptation in 
going “up” the brightness scale and 5 minutes when going from the highest 
to the lowest level. 

The Ss used in this experiment were 7 female and 17 male students at 
Purdue University. The age range was from twenty to thirty-five years. One 
“normal” S was tested with glasses, the remaining Ss were tested without 
glasses. All Ss were tested monocularly, the unused eye being occluded by a 
card in the eye-shield. 

Each S was instructed as follows: ‘‘This is a vision test. It is a rather 
complete test and will require approximately 40 minutes. You are to locate 
the position of the checkerboard in the target which is the same as the target 
used in the Ortho-Rater. (All Ss had been ‘Ortho-rated’ previously.) You 
are to respond with top, bottom, right, or left for each target presented. Each 
target will be presented 10 times. In order to determine how well you see, 
very small targets will have to be presented, so make the best response you 
can for each presentation even though you are not sure of the correctness of 
your response. Relax and rest your eye during the changes of brightness 
levels. You will be given a few minutes to get used to the next level of bright- 
ness. During that period keep your eye fixed on the target background.” 


Results 


A decimal acuity score was calculated for each § by correcting the 
raw scores for chance. The following formula was used in obtaining 
this correction: S, = 3(4C — 10) where S, is the corrected score and C 
is the number of corrected responses. 

The raw scores for each subject for a particular level of brightness 
were obtained by counting the number of correct responses on the target 














86 James E. Kuntz and Robert B. Sleight 


immediately following the smallest target on which at least 5 correct 
responses were made, then correcting this score by using the above for- 
mula. The decimal acuity score was obtained by interpolation. For 
example, S made a raw score of 4 on the fourth target in the series. The 
preceding target, no. 3, was the last target on which at least 5 correct re- 
sponses were made. ‘Target no. 3 subtends a visual angle of 1.0 giving a 
decimal acuity notation of 1.0 also. Interpolating for the interval of .20 
(the difference in decimal acuity notation for targets no. 3 and no. 
4) gave a decimal acuity notation of 1.040 for S for any one level of 
brightness. 

The analysis of variance of the decimal acuities for the subnormal 
group is given in Table 1 and a similar analysis for the normal group is 
given in Table 2. The two analyses give essentially the same results 


Table 1 


Analysis of Variance of Decimal Acuities of the Subnormal Group 








: Sum of Estimate of 
Source of Variation Squares df Variance 





Between Brightness Levels 1.65 5 329 
Between Subjects 1.17 11 .107 
Interaction 56 55 .010 
Total 3.38 71 





* Significant at the 1% level of confidence. 


Table 2 
Analysis of Variance of Decimal Acuities of the Normal Group 








Sum of Estimate of 
Source of Variation Squares df Variance 





Between Brightness Levels 33 5 066 
Between Subjects 18 11 .016 
Interaction .78 55 .014 
Total 1.29 71 





* Significant at the 1% level of confidence. 


with the exception that the normal group could be regarded as more 
homogeneous than the subnormal group. The F values found show 
that levels of brightness play an important role in determining acuity 
scores for both groups. 

Figure 2 shows graphically the relation between acuity and levels of 
brightness for the two groups. The mean decimal acuity for the sub- 
normal group was .668 as compared to 1.057 for the normal group at the 





Effect of Target Brightness 87 


lowest level of brightness, viz., 3.16 footlamberts.! This level resulted 
in the lowest acuity for both groups. Maximum acuity was reached by 
both groups at 1000 footlamberts, the mean decimal acuity being 1.061 
for the subnormal group and 1.264 for the normal group. The subnormal 
group made a mean decimal acuity gain of .393 as compared to .203 for the 
normal gfoup. The significance of the difference of mean gains resulted 
in a “‘t’”’ value of 2.54. (A “t” value of 2.819 is required for 1% level of 
confidence, 2.074 for 5% level of confidence.) 

The significance of the difference of the slopes of straight lines fitted 
to the means of each group by the method of least squares resulted in 





1.3 


1.2 

















NORMAL 
——-—-— SUBNORMAL 





~~ 
2 
2 
< 
So 
_ 
a 
a 
= 
oO 
w i. 
a 
> 
4 
Ss: 
oO 
4 
J 
az. 
p 
4 
> 
































io = BS 100 316. 


BRIGHTNESS LEVELS (FOOTLAMBERTS) 


Fic. 2. Variation of mean visual acuity for “Normal” and “Subnormal’”’ Groups with 
change in level of target brightness. N for Normals = 12, N for Subnormals = 12. 


1The reader may be more familiar with light measurement in terms of footcandles. 
The footcandle is a photometric measure which specifies the quantity of light falling 
upon a surface. A one-candle source delivers one footcandle of illumination on a 
surface when the surface is at a distance of one foot. The footlambert is also a photo- 
metric measure, but it quantifies the amount of light coming back from a reflecting 
surface. A perfectly reflecting surface which has one footcandle of illumination on it 
will have a brightness of one footlambert. In order to measure the brightness of a 
surface it is necessary to multiply the illumination on the surface (in footcandles) by 
the overall reflectance of the surface. For example, if a surface reflects 80% of the 
light which falls on it then when one footcandle of illumination is put on the surface, 
it will have a brightness of 0.8 footlamberts. Apparent footcandle, another frequently 
used brightness term, is equivalent to footlambert. 








88 James E. Kuntz and Robert B. Sleight 


a “t’’ value of 3.15 which is significant well beyond the 1% level of con- 
fidence. This technique, which takes into consideration all of the means 
of the two groups for each level of brightness, probably gives a more 
true indication of the effect of brightness increase than does a considera- 
tion of the two extreme means, viz., at 3.16 and 1000 footlamberts of 
brightness. 

By further analysis it was determined that the lower one-half of the 
subnormal group, i.e., the six Ss showing lowest acuity, made a gain of 
.429 as compared to .393 for the entire subnormal group. This gives 
some indication that the poorer the initial visual acuity the more benefi- 
cial an increase in target brightness becomes. 

Also, as shown in Figure 2, if the curves were smoothed, it is of 
interest to note that the subnormal group reached “average” visual acuity 
(1.0 decimal notation) at a level of about 40 footlamberts with little 
change thereafter. Further, it will be noticed that the normal and sub- 
normal groups attained equal visual acuity at about 3.16 and 1000 foot- 
lamberts, respectively. 

The per cent of maximum acuity is shown in Table 3. The sub- 
normal group benefited most from increased brightness as shown by a 
gain of 37.1% as compared to 26.4% for the normal group. 


Table 3 
Per Cent of Maximum Acuity * at Each Brightness Level 








Level Subnormal Group Norma! Group 





3.16 62.9 83.6 

10 70.5 91.6 
31.6 91.5 $2.9 
100 95.4 97.5 
316 97.5 97.1 
1000 100.0 100.0 


Total Gain 37.1 26.4 





* Maximum acuity is defined as mean acuity of each group at 1000 footlamberts. 


Tables 4 and 5 show the significance of the difference of means for 
the subnormal and normal groups between each level of brightness and 
every other level of brightness. 

The only significant difference between means at successive levels of 
brightness is found between 10 and 31.6 footlamberts with the subnormal 
group, as shown in Table 4. When comparing the mean at 3.16 with the 
means at each other level of brightness all are significant beyond the 1% 
level except one, viz., between means at 3.16 and 10. It seems especially 





Effect of Target Brightness 


Table 4 


Critical Ratio of Differences Among Acuity Means at Each Level of 
Brightness for the Subnormal Group 








Levels of 
Brightness 3.16 10 31.6 100 





3.16 , 7.40 * 
10 5.44 * 
31.6 
100 
316 
1000 





* Significant at the 1% level of confidence. 
t Significant at the 5% level of confidence. 


Table 5 


Critical Ratio of Differences Among Acuity Means at Each Level of 
Brightness for the Normal Group 








Levels of 
Brightness 3.16 10 31.6 100 





3.16 2.06 2.40 f 
10 0.34 1.55 





* Significant at the 1% level of confidence. 
t Significant at the 5% level of confidence. 


important to point out that there are no significant differences (at 1% 
confidence level) between brightness intensity 31.6 and any higher 
brightnesses within the range of brightnesses used. 

As shown in Table 5 the critical ratios for the differences among means 
for the normal group are in general considerably smaller than for the 
subnormal group, although when comparing the mean at 3.16 with means 
at all other levels of brightness all differences are significant beyond the 
5% level of confidence with one exception; the difference between the 
means at 3.16 and 10 is significant just slightly below the 5% level. 
One slight reversal can be noted, viz., between means at 100 and 316. 
With this group the fact that no significant gain (at 1% confidence level) 
in performance is obtained when the brightness is raised above 10 foot- 
lamberts may be of considerable consequence. 

The average variability calculated by averaging the sigmas for each 











90 James E. Kuntz and Robert B. Sleight 


group at each level of brightness was .154 and .103 (decimal notation) for 
the subnormal and normal groups, respectively. The standard deviation 
for the subnormal group for the lowest level of brightness was .150 and 
for the highest level of brightness .151. For the normal group the 
standard deviation at the lowest level was .175 and for the highest 
level .088. 

Discussion 


The findings of this experiment give additional confirmatory evidence 
that brightness is one of the primary factors in vision. Several previous 
investigations have shown conclusively that a person’s visual acuity 
increases with target brightness. The present study, however, may be 
distinguished from most of the other investigations because it showed that 
the degree of gain in terms of ability to discriminate visually fine detail 
was relatively greater for those persons having initial below normal visual 
acuity, than for those having initial above normal acuity, when target 
brightness was increased. 

It is not felt that the findings of this experiment warrant as specific 
a proposal concerning prescription of illumination intensities as was made 
by Ferree, Rand and Lewis (2): ‘‘In each case the individual needs should 
be determined and the intensity given that is required.’”” However, it 
does permit the more generalized statement that when visual acuity is 
the primary concern, light levels should be relatively “high” on jobs 
requiring the seeing of details where persons with reduced visual acuities 
are employed. In all likelihood little advantage would be gained for 
these people by prescribing more than approximately 31.6 footlamberts. 
For individuals with normal visual acuity it is probable that there would 
be only slight advantage in prescribing more than approximatley 10 
footlamberts. 

It should be obvious why the authors of this article hesitate to rec- 
ommend specific brightness levels for specific individuals. For one 
thing, the steps used were half-log steps and somewhat gross. Also, 
when individual cases are considered, certain complicating factors may 
be encountered even in evaluating a light level on the basis of threshold 
measurements. Not the least of these may be the motivational factors 
acting on the individual due to his relating a target brightness to the light 
under which he has been accustomed to working. 

It should be borne in mind that this experiment was of a visual 
threshold nature. Tinker (3) believes that: “One should not prescribe 
illumination for suprathreshold tasks in terms of threshold measure- 
ments.” The problem of optimum light intensities to minimize fatigue 
on prolonged tasks has not been covered in this investigation. Maximum 
acuity does not necessarily imply optimum working conditions because of 





Effect of Target Brightness 91 


other variables which may be included in the overall situation. How- 
ever, until a satisfactory criterion for ‘‘visual fatigue” has been ascer- 
tained it would seem desirable to utilize the findings from threshold ex- 
periments in choosing desirable light levels for “fine” tasks. Naturally 
specification of any feature of the working environment should take 
cognizance of the attitudinal viewpoint of the worker. 

The findings of this experiment may have extensive ramifications, 
particularly in two interrelated endeavors, viz., (1) establishment of 
illumination standards, and (2) selection and placement policies in 
instances wherein an employee’s visual acuity may be a factor in satis- 
factory performance. 


Summary and Conclusion 


An experiment was performed to determine whether the amount of 
increase in visual acuity, with increase of brightness on targets, differs 
markedly for persons with initial “subnormal’” acuity from those with 
initial “normal” acuity. The experiment was of a threshold nature with 
subjects locating checkerboard targets under six levels of target brightness 
varying from 3.16 footlamberts to 1000 footlamberts. 

It was found that a subnormal group gained significantly more in 
visual acuity terms with an increase in target brightness than did a normal 
group. 

The data show that adequate light for seeing details is: 


(1) Between 10 and 30 footlamberts for those with normal vision; 
(2) Somewhere between 30 and 40 footlamberts for those with sub- 
normal vision. 


a“ 


Received October 28, 1948. 
Early publication. 


References 


1. Ferree, C. E., and Rand, Gertrude. The effect of intensity of illumination on the 
near point of vision and a comparison of the effect for presbyopic and non- 
presbyopic eyes. Trans. Illum. Engng. Soc., 1933, 28, 590-611. 

2. Ferree, C. E., Rand, Gertrude, and Lewis, E. F. The effect of increase of intensity 
of light on visual acuity of presbyopic and non-presbyopic eyes. Trans. Illum. 
Engng. Soc., 1934, 29, 296-313. 

3. Tinker, M. A. Illumination standards for effective and easy seeing. Psychol. Bull., 
1947, 44, 435-450. 

















Book Reviews 


Lawshe, Jr., C. H. Principles of personnel testing. New York: McGraw- 
Hill Book Co., 1948. Pp. 227. $3.50. 


This book is an elementary treatment of the problems involved in the 
testing of employees for purposes of selection. The expected phases are 
covered, namely test construction and validation (Chaps. II, III, IV and 
XIII), review of previous findings concerning the effectiveness of em- 
ployment tests (Chaps. V through XII), and establishment of testing 
programs (principally Chap. XIV). 

The approach taken by Lawshe will meet the approval of industrial 
psychologists. He is concerned with many of the problems in testing 
that can be appreciated only by one who has worked directly in industry. 
However, in the reviewer’s opinion the book gives only a general overview 
of the field. The coverage of important problems, methods, and findings 
is spotty, and any reader, virgin to testing, may obtain an incorrect, and 
certainly an incomplete, picture. 

The problems dealing with test construction and validation are ade- 
quate as far as they go—but they do not go far enough. The concept of 
reliability is not mentioned either in connection with criteria or with 
tests. There is no treatment of the combination or weighting of tests in 
a battery. Lawshe states that the purpose of the book is to serve as an 
aid to management. But the level at which the book is cast suggests 
that the author is underestimating the intellectual capacities of his 
potential readers. The point of view seems to be that concepts of second 
order difficulty, even though they be fundamental in nature, have no 
place in an introductory presentation. This simplified treatment is 
likely to convey to the wrong people the erroneous impression that any- 
one can develop and operate a testing program without getting into 
difficulties. 

The chapters concerned with reporting previous findings on the 
validity of tests will be disappointing to many. It is by no means an 
extensive review. Rather these chapters are intended to present illus- 
trative findings concerning the usefulness of different types of tests for 
various kinds of jobs. Several comments seem pertinent in this con- 
nection. Not a single example of the work of the U. S. Employment 
Service on aptitude testing is cited. Yet surely Stead and Shartle’s now 
classic ‘Occupational Counseling Techniques’ which summarizes so 
many of these excellent investigations deserves mention here. However, 


92 





Book Reviews 93 


of considerable value are the findings of a number of investigations not 
published elsewhere. Of the validation studies cited, only a very few 
with negative findings are given. Thus tests are hardly ever put in an 
unfavorable light. It reminds one of the optimistic descriptions of 
validity given in the manuals of directions of so many published tests. 
Any members of management or unions who believe this to be the true 
picture are in for a sorry disappointment. Even a cursory review of 
published reports will indicate that there is considerable variation in the 
effectiveness of any given test applied to different groups of workers on 
the same job. . 

Yet in fairness to the author it should be pointed out that he does not 
subscribe to the rather extreme views as presented in his publisher’s 
advertising. Thus while the publisher claims that “From now on you can 
place the right person in the right job every time!’”’, Lawshe emphasizes 
that tests are by no means a cure-all and can simply increase the proba- 
bility of selecting better employees. 

The third topic, establishment of testing programs, is likely to be 
dismissed as being another set of “‘practical” rules. This would be an 
error. While one might have hoped for a more expanded treatment, 
Lawshe here deals with most important problems. Today any book in 
this field purporting to be a “Principles” of employee testing would be 
woefully inadequate if it avoided such areas as supervisory support, 
budget, labor unions, personnel records and management reports. The 
day when the psychologist’s task begins with the writing of items and 
ends with the computation of the validity coefficient is past, if it ever 
existed. Lawshe recognizes this and entertains for discussion certain of 
the important implications of testing in its larger setting of personnel 
problems and labor relations. 

In sum, the reviewer’s chief criticisms of this book are omissions of 
fundamental problems and concepts, and the elementary nature of the 
presentation. Those who use the book as a text either in college courses 
in testing or in similar courses for members of management or labor 
unions will undoubtedly find it necessary to provide supplementary 
material and discussion. 

Edwin E. Ghiselli 

University of California 

Berkley, California 


Chapin, F. Stuart. Experimental designs in sociological research. New 
York: Harper and Brothers, 1947. Pp. x+206. $3.00. 


The breaking away of psychology from philosophy and the attempts 
of psychologists to convert their discipline into an exact science resulted 











94 Book Reviews 


in a professional compartmentalization, the unfortunate effects of which 
have only recently become clear to many psychologists. In particular, 
we have been insufficiently aware of the attempt of sociologists to estab- 
lish their field as a science, for they broke away from philosophy after we 
did and our natural science orientation has generally kept us from ob- 
serving their efforts and progress. 

Several works have in recent years attempted to review and con- 
solidate the scientific gains made by sociologists: Lundberg’s Foundations 
of Sociology and Greenwood’s Experimental Sociology come to mind as 
illustrations. Chapin’s little volume is another distinctive contribution. 
As he puts it in his preface, his purpose is “‘to illustrate the method of 
experimental design by reproducing concrete studies,” to provide ‘‘a 
source book of examples of specific application (of the fundamental logic 
of experimental designs) analyzed in some detail’’ (ix). This is done by 
analyzing the methods used in nine experimental studies. Both the 
methods and the findings are of interest to applied psychologists. 

Chapin classifies the experimental designs used in sociological re- 
search under three headings: the cross-sectional study, the projected 
(before and after) design, and the ex-post-facto (retrospective) design. 
His interest in these types of experimentation arises from the fact that 
they can be used in real-life situations and are not limited to the labora- 
tory or classroom. He points out, for example, that social legislation 
(e.g., slum clearance and work relief) is social experimentation, and his 
interest is in experimental designs which can be used in the evaluation 
of the effects of such experimentation. 

The detailed analyses of experimental designs are stimulating in that 
they point up the possibilities of research in practical situations, and 
helpful in that they make clear the weaknesses and advantages of various 
procedures. Chapin brings out, for example, the inability of the social 
scientist to emulate the natural scientist in controlling all but one of the 
variables in an experiment. He analyzes the alternatives and shows how 
one of the best (randomization) is not generally feasible in real life (e.g., 
WPA workers were selected not only on the basis of need but also on the 
basis of employability, thereby making them differ from the direct-relief 
clients with whom they were to be compared in a study of the effects of 
work relief on morale). He then examines the use of available experi- 
mental groups and control groups as a solution. One such study (99-124) 
evaluating the effect of high-school graduation on economic adjustment 
(an ex-post-facto study) is analyzed to show the relative effects, on both 
numbers and definitiveness of results, of the precise matching of indi- 
viduals and of the grosser matching of distributions. Matched distribu- 
tions yielded experimental and control groups of 145 each from an original 





Book Reviews 95 


group of 1194, contrasted with groups of 23 matched individuals each 
(matched for six variables). But the former method showed an insignifi- 
cant difference in the economic adjustments of graduates and drop-outs, 
whereas the latter method showed that the graduates were clearly more 
successful. The emphasis on the need for the repetition of experiments 
in similar situations, in order to test the justifiability of generalizations, 
is also noteworthy. 

A chapter and appendices dealing with sociometric scales should be of 
especial interest to psychologists. Some of these, both psychological 
and sociological, are already familiar, but the emphasis is on new instru- 
ments such as Chapin’s Social Participation Scale and the revision of his 
Social Status Scale. 

There is an interesting but, in this reviewer’s opinion, unsuccessful 
attempt to justify cause-and-effect conclusions from indices of associa- 
tion. Chapin launches it by pointing out that the only alternatives are 
belief in chaos, magic, or means-end relationships. But the logic is 
fallacious, because the existence of such alternatives does not make it 
necessary to conclude that one of a particular pair of associated 
variables is the cause of the other. One may accept the principle of 
causation without being justified in concluding that, since the differences 
in the social adjustments of WPA workers and recipients of direct relief 
are statistically significant, and since the groups were matched on seven 
factors, work relief has a more beneficial effect than direct relief (p. 42). 
It is conceivable that better-adjusted relief clients were selected for work 
relief (the reviewer knows of WPA projects in which this was the standard 
practice), in which case superior adjustment was the cause of receiving 
work relief, rather than work relief being the cause of superior adjust- 
ment. Belief in causation does not indicate the direction of specific cause- 
and-effect-relationships. Statistics show association; the attribution of 
causal connections is a process of deduction. But this recurrent fallacy 
is of minor importance, provided the reader is aware of it. 

This is a valuable book for those who are concerned with the design 
of experiments in social, clinical, educational, and vocational psychology, 
whether as research workers or as instructors. Its exposition is clear, 
it is rich in illustrative material, and the research principles which it 
illustrates are of widespread importance in the social sciences. It will 
also serve to introduce psychologists to an aspect of contemporary 
sociology of which many are too unaware. 


Donald E. Super 


Department of Guidance, Teachers College 
Columbia University 











96 Book Reviews 


J. G. Darley, Chairman, et al. The use of tests in college. Washington, 
D. C.: American Council on Education, 1947. Pp. vii+82. $1.00. 

Froehlich, Clifford P., and Benson, Arthur L. Guidance testing. Chi- 
cago: Science Research Associates, Chicago, 1948. Pp. viii+104. 
$1.00. 


Clarification of problems through fresh insights regarding them can 
be a fruitful approach. Although this American Council on Education 
publication is addressed to a college audience, it contains much of value 
for the users of tests at most educational levels. 

The frame of reference centers upon five questions, “Who shall be 
admitted?”, “How shall students choose appropriate curriculums?”’’, 
“‘How shall we counsel students?’’, ‘‘How shall we measure outcomes?”’, 
“How do we measure behavior?” This method makes the material 
useful to college and secondary school administrators, counselors, and 
thoughtful instructors. The recommendations regarding the use of tests 
are properly cautious and practicable. 

A major value of the publication is its interpretative approach to test 
use in terms of generalizations, rather than use of a multitude of specifics 
related to particular tests. The reader is urged to consider carefully 
Section V, How shall we measure outcomes? (pp. 43-57). The presentation 
of an examination structure for colleges on pages 45 and 46 supplies an 
excellent general framework for considering test results in relationship 
to other types of data and several kinds of personnel workers. 

In the reviewer’s opinion, the objectives and implications stated in 
the Foreword by Dean T. R. McConnell and Dean E. G. Williamson are 
well met. It is his opinion also that it is a somewhat restricted but 
generally excellent presentation. Strongly recommended reading for 
student personnel workers and administrators in educational institutions. 

Froehlich and Benson say: ““This book is addressed to those individuals 
who are faced with the responsibility of carrying on a guidance program 
in which they must directly or indirectly administer and interpret tests, 
even though their training in tests and measurements is limited (p. v).” 

Guidance Testing poses again the problem of waiting until workers 
are competent before using tests or urging that they gain this competence 
by using tests cautiously. The book is definitely posited on the very 
practical philosophy that naive personnel in education will use tests and 
that it is sensible to help them to avoid errors in practice. 

A major strength is the frank facing of the fact that proper use of 
tests requires statistical knowledge so they include descriptions of simple 
statistical methods and interpretations. This is in line with the view- 
points of Bingham, Crawford, Darley, and others. 

The authors present typical tests of various kinds under the rubrics 





Book Reviews 97 


scholastic aptitude, achievement, interest, personal adjustment, and special 
aptitude (pp. 23-46). <A footnote (p. 23) calls attention to the fact that 
the authors use these tests as examples, not as a selected list of the best 
instruments. Although agreement on the best instruments is probably 
impossible, some evaluation of excellence might have been helpful. One 
of the most frequently asked and most legitimate questions of the new 
test user is, ‘‘What are the best tests for my purposes?” Perhaps the 
answer is, ‘“‘Consult the nearest competent person.” 

The feeling of the reviewer is that this is a helpful and useful book. 
This is particularly true if the audience for which it is intended follows 
through with graduate training which will make the book no longer 
needed. 


Milton E. Hahn 
University of California at Los Angeles 


Erickson, Clifford E. (Editor). A basic text for guidance workers. New 
York: Prentice-Hall, Inc., 1947. Pp. 566. $4.25. 


The editor of this volume, Dr. Erickson, who is Professor of Education 
and Director of the Institute of Counseling, Testing, and Guidance at 
Michigan State College, has written several previous books in the guid- 
ance field. Drawing heavily upon Michigan State College guidance 


instructors as well as upon a variety of guidance experts in other school 
systems, this text “attempts to portray many different aspects of the 
guidance program and at the same time to indicate the extent of some 
of the specializations within the field as a whole.” In the preface the 
editor specifies that the book is intended as a basic or beginning text for 
training school counselors. 

In general, the content of the book attempts to give the guidance 
worker (particularly the secondary school teacher) an over-view of the 
guidance field including purposes, techniques, and administration of the 
guidance program. Although the book fulfills its avowed aim to a large 
degree, the quality of the chapters varies widely and some overlap is 
evident as is often the case with such symposia. Aside from certain minor 
points the best chapters appear to be: the aims, objectives, and principles 
of the guidance movement (C. E. Erickson), interviewing techniques (S. 
A. Hamrin), therapeutic counseling (H. B. Pepinsky), helping pupils 
with their problems (P. L. Dressel), the community occupational survey 
(Elizabeth K. Wilson), the role of work experience (C. A. Weber), place- 
ment and follow-up services (L. O. Brockmann and L.,Smith), and 
organizing the guidance program (C. M. Horn). 

The major criticisms of the volume can be summarized briefly: 
unevenness in quality and treatment of material as well as in level of 











98 Book Reviews 


difficulty (the chapter on therapeutic counseling may be heavy going for 
the average high school counselor); a tendency to apportion too much 
space to less important topics (case-study techniques and working with 
home and community); lack of critical evaluation of occupational infor- 
mation sources and of testing instruments; lack of functional information 
about the world of work in the job levels where most secondary school 
students will be employed. 

The commendable features probably outweigh these criticisms, how- 
ever. There is an admirable emphasis on the primary importance of 
individual counseling. The many good, practical suggestions are of 
great potential value to the guidance worker and should earn the authors 
many a word of praise from hard-pressed counselors. Another fine 
feature is the inclusion of many illustrative forms and excellent bibli- 
ographies. 

Perhaps the most discouraging point raised by this book is the 
tremendous store of material, skills, and information the counselor must 
have as minimum working equipment. Insofar as guidance techniques 
can be put across in book form to the beginning counselor, A Basic Text 
for Guidance Workers is effective in attaining its aim. 

William A. McClelland 


Brown University, 
Providence, R. I. 


Clarke, H. Harrison. The application of measurement to health and 
physical education. New York: Prentice-Hall Incorporated, 1945. 
$5.00 Pp. 415. 


This text is organized on a functional outline. After considering 
some of the fundamentals underlying testing in health and physical educa- 
tion, the measurement of physical fitness of social efficiency and physical 
education skills and appreciations are considered in turn. 

In the section on physical fitness, there is the usual discussion of 
medical and sensory tests, of cardio-vascular tests, and a section on 
measurements and estimates of nutritional status. There is a well- 
written and sound treatment of the problems and possibilities associated 
with the technique of somatotyping. This is followed by an excellent 
discussion of measurement in the field of posture. 

The author gives a somewhat undue amount of space to the so-called 
“physical fitness index,’’ which is the percentage residual from the regres- 
sion value of a general strength test. In the opinion of the present 
writer, far too much value is assigned to this test, and it is assigned 
attributes which it does not deserve. While strength is important in 
physical fitness, it does not deserve the reverence given here. The 
author again reverts to this test in the next part of his book. 





Book Reviews 99 


In the part of the book given over to tests of social efficiency, the 
author has introduced a section that is new to testing textbooks in this 
field. He discusses—and gives numerous references to—a number of 
tests in the field of personality studies. This marks an advance in this 
field and these tests should be given more prominence in physical educa- 
tion and health studies as time goes on and as these tests improve. 

In the field of skill tests, because of the large numbers of such tests, 
the author has had to choose a few of the more important ones to de- 
scribe, and to give bibliographical references for the others. His choices 
have, on the whole, been good. The same has been true of his presenta- 
tion of knowledge tests. 

The text closes with a discussion of the administrative problems in 
physical and health education testing, and an appendix devoted to the 
not-uncommon attempt to compress a semester’s course in statistical 
methods into a chapter. 

On the whole, the book presents a number of fresh viewpoints and 
is a useful text and reference book in its field. 


C. H. McCloy 
The State University of Iowa 


Ross, C. C. Measurement in Today’s Schools. 2nd Ed.; New York: 
Prentice-Hall, Inc., 1947. Pp. xviii+597. $4.50. 


Ross, C. C. Chapter Exercises and Tests to Accompany Measurement in 
Today’s Schools. 2nd Ed., New York: Prentice-Hall, Inc., 1947. 
Pp. vi+74. 


The second edition of this elementary textbook in educational meas- 
urement appears six years after the first edition. The organization, 
chapter and section headings, and approximately ninety-five per cent of 
the content remain unchanged. In general the references have been 
brought up to date and the content modified sufficiently to incorporate 
them. The chapter tests which appeared at the end of each chapter in 
the first edition have been supplemented by appropriate problems and 
placed in a consumable workbook to accompany the text. 

The text with its accompanying exercise book is designed specifically 
for a first course in educational measurement. It is well organized to 
give the beginning teacher and partially trained school administrator an 
integrated insight into the theory of measurement, descriptive statistics 
and individual differences as they relate to school organization and in- 
struction. Almost one-half of the text is devoted to the uses of measure- 
ment in motivation, learning, diagnosis, marking, grouping, promotion, 
guidance and evaluation. Appropriate emphasis is placed on the con- 
struction of informal teacher-made tests. 











100 Book Reviews 


While the approach to the use of measurements is functional and in- 
tegrative it is also traditional and uncritical. In the cataloguing of re- 
search the point of view is that of the professor of measurement rather 
than that of the director of school organization and learning. The extent 
to which our present knowledge of individual and trait differences in the 
schools points to new uses of measurement and needed reforms in school 
organization and practice is not sensed. Nevertheless these two books 
rate high among the teachable books available in the field. 


Walter W. Cook 
University of Minnesota 





Erratum 


In the December 1948 issue of the Journal of Applied Psychology, an 
error occurred in the article, “The Effectiveness of Intelligence Tests 
in the Selection of Workers” by E. E. Ghiselli and C. W. Brown. On 
page 576 the last eight lines of type should have been inserted following 
the first line on page 577. 





Erratum 


In the December 1948 issue of the Journal of Applied Psychology, 
under New Books, books by the following authors: H. L. Goldberg, K. 
Goldstein, D. T. V. Moore, Strauss and Lehtinen, L. R. Wolberg, and 
L. Szondi, were erroneously listed as being published by a book seller 
by the name of M. W. Drexler. The real publisher is New York: Grune 
and Stratton, Inc. 





New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should be sent to 
Donald G. Paterson, Editor, Department of Psychology, University 
of Minnesota, Minneapolis 14, Minnesota 


Studies in psychosomatic medicine. Franz Alexander and Thomas M. 
French, Editors. New York: Ronald Press Co., 1948. Pp. 568. 
$7.50. 

Some psychological apparatus: a classified bibliography. T. G. Andrews. 
Psychological Monographs No. 289. Washington, D. C.: American 
Psychological Association, 1948. Pp. 38. 

Attitudes of German prisoners of war: a study of the dynamics of national- 
socialistic followership. H. L. Ansbacher. Psychological Mono- 
graphs No. 288. Washington, D. C.: American Psychological Asso- 
ciation, 1948. Pp. 42. 

Foundation of psychology. Edwin G. Boring, Herbert S. Langfeld, and 
Harry P. Weld. New York: John Wiley and Sons, Inc., 1948. Pp. 
632. $4.00. 

Current trends in clinical psychology. A. W. Combs, et al. New York: 
The New York Academy of Sciences, 1948. Pp. 62. 

The American woman in modern marriage. Sonya Ruth Das. New 
York: Philosophical Library, 1948. Pp. 185. $3.75. 

Hearing and deafness. Hallowell Davis, Editor. New York: Murray 
Hill Books, Inc., 1948. Pp. 496. $5.00. 

An application of the level of aspiration experiment to the study of personality. 
Sibylle K. Escalona. New York: Bureau of Publications, Teachers 
College, Columbia University, 1948. Pp. 132. $2.10. 

The use of training films in department and specialty stores. Harry M. 
Hague. Boston: Harvard Business School, 1948. Pp. 147. $1.50. 

Understandable psychiatry. Leland E. Hinsie. New York: The Mac- 
millan Co., 1948. Pp. 359. $4.50. 

Your job. Fritz Kaufmann. New York: Harper and Brothers, 1948. 
Pp. 238. $2.75. 

A study of thumb- and finger-sucking in infants. Mary 8. Kunst. Psycho- 
logical Monographs No. 290. Washington, D. C.: American Psycho- 
logical Association, 1948. Pp. 71. 

Graduate training for educational personnel work. Corinne LaBarre. 
Washington, D. C.: American Council on Education, 1948. Pp. 54. 


$1.00. 
101 








102 New Books, Monographs, and Pamphlets 


The commonsense psychiatry of Dr. Adolf Meyer. Alfred Lief. New 
York: McGraw-Hill Book Co., Inc., 1948. Pp. 677. $6.50. 

The strategy of job finding. George J. Lyons and Harmon C. Martin. 
New York: Prentice-Hall, Inc., 1948. Pp. 408. $3.25. 

The open self. Charles Morris. New York: Prentice-Hall, Inc., 1948. 
Pp. 179. $3.00. 

Educational psychology. Harvey A. Peterson. New York: The Mac- 
millan Co., 1948. Pp. 550. $4.00. 

Training employees and managers. Earl G. Planty, William S. McCord, 
and Carlos A. Efferson. New York: The Ronald Press Co., 1948. 
Pp. 278. $5.00. 

The emotions. Jean-Paul Sartre. New York: The Philosophical Library, 
1948. Pp. 97. $2.75. 

The teacher as counselor. Donald J. Shank, et al. Washington, D. C.: 
American Council on Education, 1948. Pp. 48. $.75. 

The legend of Henry Ford. Keith Sward. New York: Murray Hill 
Books, Inc., 1948. Pp. 550. $5.00. 

Van Allyn methods manual. Keith Van Allyn. Palo Alto, Calif.: Surveys, 
Inc., 1948. Pp. 117. Manual plus 25 Qualification Inventories 
$7.50. 

Cybernetics. Norbert Wiener. New York: John Wiley and Sons, Inc., 
1948. Pp. 194. $3.00. 

Pediatrics and the emotional needs of the child. Helen L. Witmer, Editor. 
New York: The Commonwealth Fund, 1948. Pp. 180. $1.50. 

Diagrams of the unconscious. Werner Wolff., New York: Grune and 
Stratton, Inc., 1948. Pp. 423. $8.00. 

Personnel management and industrial relations. Third Edition. Dale 
Yoder. New York: Prentice-Hall, Inc., 1948. Pp. 894. $5.00. 
Exploring individual differences. Committee on Measurement and 
Guidance. Washington, D. C.: American Council on Education, 

1948. Pp. 110. $1.50. 

Exploring a first grade curriculum. New York Board of Education. 
Publication No. 30. New York: Bureau of Reference, Research and 
Statistics, Board of Education, 1947. Pp. 104. $.50. 

Influencing and measuring employee attitudes. Personnel Series Number 
113. New York: American Management Association, 1948. Pp. 55. 
$1.00. 

Problems and experience under the labor-management relations act. Per- 
sonnel Series Number 115. New York: American Management 
Association, 1948. Pp. 35. $.75. 

New patterns of employee relations. Personnel Series Number 117. New 
York: American Management Association, 1948. Pp. 50. $1.00. 











1948 DIRECTORY 


AMERICAN PSYCHOLOGICAL ASSOCIATION 
1515 MASSACHUSETTS AVENUE NORTHWEST 
WASHINGTON 5, D. C. 


Contains biographical and geographical 


lists of members, by-laws, and lists of 
members of the eighteen divisions of the 


American Psychological Association. 


438 pages $3.00 


























Reprinted: 


OPINION -ATTITUDE 
METHODOLOGY 


by 
QUINN McNEMAR 


$1.25 


This popular issue of the Psycho- 
logical Bulletin has been reprinted 
so that it is again available. 


It presents an appraisal of the tech- 
niques and methods used in opinion 
and attitude research. 


American Psychological 
Association 


1515 Massachusetts Avenue N. W. 
Washington 5, D.C. 

















vous ite pvantamen sons 
MAR JUN SEP DEC! 


e 


Peete a 
5 os Gs C2 Ge Ge Ge GE & |} Gy Ge 099 OS Go Ose Oo Od Gu ts ¢ wwwt wawew | 


SHEERCEREE 


$6.00 
$6.00 


fuk et ps ee en et pf tet bt Be ae pt Ot fs De ae et a | eel eee 
WNHKVKNVADHN | VNNNABNNHONNNND 5 WAN I NN 
ehnebnasnnnnonnonconne © Sas Cabal 3 


AAARAAAADAAAAE | ANANAAaAaAG 8 


20 
22 
23 
24 
2s 
26 
27 
28 
29 
30 
31 
32 
33 


$181.00 
| $4.30 
Net price, Volumes 1 through 32 $126.70 


try Wena ad ented vegan vere only 13 numbers are out of print 7 < 


() four numbers Su ak tare: ore ate bers 2 
ae tase Sue do ead ba anid oaks ah oe : iexavierss: iemek, wigs ae 


pemerntetion, Seaet. eaten Se: nga ara ed at $6.00 per 
oper mumber te $1.8.” For eaber is $1.78; when ox sumbers 
Konstieied aherce following discounts 
109 ou ordors of $ 50.00 and over 
209% on orders of $100.00 and over 
30% on orders of $150.00 and over 
Current subscriptions and orders for back numbers should be addressed to: 


AMERICAN PSYCHOLOGICAL ASSOCIATION, INC. 
1515 Massachusetts Avenue, N. W. 
Washington 5, D. C. 





ares 





i 





7 


AL Cfrom the Best Seller List of the American 








‘Psychological Association 
FOR WINTER 1948~1949 


McNemar, Quinn. Opinion-attitude methodology. Psycho- 
logical Bulletin, 1946, #4 (July). 
Andrews, T. G.. Some psychological apparatus: a classified 
bibliography. Psychological Monographs: General and 
. Applied, 1948, #289. 
Wolfie, Helen M.. ) 1948 Directory, American Psycho- . 


Snyder, Wm. U. The present status of psychotherapeutic 
counseling. Psychological Bulletin, 1947, #4 (July). 

















